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ABSTRACT 


The task of selecting the best set of spectral channels 
is vital to the design of multispectral remote sensor 
systems. i'r would be desirable to choose a sensor design 
such that the entire pattern recognition system performs 
in an optimal manner. In order to choose a design which will 
be optimal for the largest class of remote sensing problems, 
a method is developed which attempts to represent the 
spectral response function from a scene as accurately as 
possible. The performance of the overall recognition 
system, then, is studied relative to the accuracy of the 
spectral representation. The spectral representation is 

only one of a set of five interrelated parameter cate- 

\ 

gories which also includes the spatial representation 
parameter, the signal-to-noise ratio, ancillary data, and 
information classes. 

The spectral response functions observed from a stratum 
are modeled as a stochastic process with a Gaussian proba- 
bility measure. The criterion for spectral representation 
is defined by the minimum expected mean-square error. The 
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sensor is modeled as a set of basis functions such that the 
output approximates the input by a linear combination of 
the basis functions suitably weighted by a sequence of 
coefficients. The optimum set of basis functions with 
respect to the mean-square error criterion is given by the 
solutions to the Karhunen-Loeve expansion. The development^, 
of the Karhunen-Loeve expansion was generalized to include 
a weight function such that each point in the spectral 
interval could be assigned a weight corresponding to its 
importance. The computation of the optimum set of basis 
functions is incorporated into an analytical procedure that 
seeks to design practical sensors, comparing their per- 
formance against the optimal design. 

The five parameter categories are discussed with regard 
to their effect on the pattern recognition system perform- 
ance. The usefulness of the graph of the recognition system 
performance as a function of spectral representation is 
introduced. 

A software system is developed to test and evaluate 
this method using field measurements data taken from two 
locations on three different dates each. Four different 
weight functions are evaluated. The effect of sample size 
on the evaluation of a data set is demonstrated. For each 
stratum the first few eigenvectors are plotted, and the 
mean-square error and probability of correct classification 
are evaluated. The graphs of probability of correct 
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classification vs. expected mean-square error allow the 
study of the relationship between classification performance 
and spectral representation. One can also study the 
dimensionality of the observation space relative to repre- 
sentation and performance. 

The procedure is demonstrated to be a valuable tool for 
the design of sensors for the limited collection of data. 

The value of the weighted Karhunen-Loeve expansion is 
demonstrated. The performance of several suboptimal sensors 
are compared with the optimal design. A proposed suboptimal 
sensor is designed which demonstrates superior performance 
in representation accuracy and classification accuracy 
over the other suboptimal sensors. It is shown that spec- 
tral sampling should be done using spectral channels which 
have a smaller bandwidth, particularly in the red part of 
the visible region, than are currently being used on 
operational sensor systems. 



CHAPTER 1 . INTRODUCTION 


Earth observational remote sensing has emerged as a 
prominent technology in the last two decades. Important 
developments in sensor technology/ computer systems, pattern 
recognition theory, and image processing techniques have 
brought the remote sensing state-of-the-art to the point 
where it is a powerful tool for studying earth resources. 

With the launching of the Landsat satellites and advanced 
automated processing of the image data, worldwide monitoring 
of the earth's surface for locating and utilizing natural 
resources is now a reality. 

1.1 The Pattern Recognition System 

A basic tool for remote sensing is pattern recognition. 
From a systems perspective the components of a pattern recog- 
nition system can be placed into three distinct blocks - the 
scene, the sensor, and the processor (Figure 1.1). The 
scene includes everything in front on the sensor. The 
information in the scene is contained in the spectral, spatial 
and temporal variations of the electromagnetic energy that 
is either reflected by or emitted from the earth's surface 
and passes through the atmosphere. The sensor measures the 
received electromagnetic energy and prepares the measurements 
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for transmission to the processor. The processor digitally 

implements a set of algorithms for classification and image 

" 

processing as required by the user or analyst. 

In this research we focus on the sensor subsystem and 
develop an analytical technique for selecting certain parame- 
ters for the sensor design. Because of the interrelationship 
with other parts of the pattern recognition system, the 
sensor design problem will be considered as a part of the 
integrated overall system design problem. That is, sensor 
design choices will be made on the basis of overall system 
performance. Therefore, we begin with a more detailed 
discussion of the parts of the system and how they interface 
with each other. 

1.1.1 The Scene 

A distinctive characteristic of the scene is that it 
is not under the control of the system designer or the 
analyst. In fact, the intent of remote sensing is to observe 
and learn as much about the scene as possible without modify- 
ing it or affecting it in any way. 

For current earth observational remote sensinq problems, 
the information bearing signal is the spectral response 
function x(A,r,s,t). The parameters of this function are 
the wavelength. A, the spatial coordinates, r and s, and 
time, t. Historically, the desired information has been 
extracted primarily from the spectral variations (Holmes 
and MacDonald, 1969). to which we will limit ourselves, here, 
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although significant progress is being made in extracting 
information from spatial (Kettig and Landgrebe, 1976; 
Haralick et al, 1973; Wiersma and Landgrebe, 1976) and 
temporal variations (Swain, 1978) . A typical spectral 
response function for green vegetation at a fixed location 
and time is shown in Figure 1.2. The interval of interest, 
A, typically includes the visible and infrared regions of 
the electromagnetic spectrum from 0.4 micrometers to 2.4 
micrometers . 

The scene is very dynamic and complex. Changes in sun 
angle, atmospheric conditions, climate, cover type, and a 
variety of other variables can produce significant changes 
in the spectral response. Instead of trying to account for 
each of the variables that affect the spectral response, we 
choose to model the scene as a stochastic process. The 
complete characterization of this process model is not known 
a priori. In order to obtain this knowledge, one observes 
the scene over a period of time, an area of space and an 
interval of the spectrum and estimates the parameters from 
the observations which are necessary to complete the 
characterization.. ' 

It is generally necessary to group the observations 
taken from the scene into classes. For purposes of classi- 
fying the data into distinct classes it is .required that 
the class list have the following properties simultaneously 
(Landgrebe, 1978) : 

- Each class must be of interest to the user, 
i.e. of informational value 


arujcna-ozvxu 
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- The classes must be separable in terms of the 
features available 

- The list must be exhaustive, i.e., there must be 
a class to which it is logical to assign each 
pixel in the scene 

1.1.2 The Sensor 

The function of the sensor is to transform the con- 
tinuous parameter functions x(A) into a finite number, N, 
of measurement values (x^, X 2 , . .., X^) called features. 
Ideally, the sensor would be under the control of the system 
user who could then optimize the sensor and the processor 
for a specific application. However, in practice the sensor 
is a complex, expensive system which is designed infre- 
quently. Control over the sensor, consequently is the 
responsibility of the system designer rather than the user. 
The system designer cannot optimize the sensor for a 
particular location, time, and application but must create 
a single instrument which must serve a broad spectrum of 
users and applications over a number of years. 

A basic sensor is shown schematically in Figure 1.3. 

The system components can be placed into five blocks - the 
collector (typically a set of optics) which collects the 
electromagnetic energy over the spectrum of interest, the 
scanning mechanism which controls the pointing of the 
collector, a spectral dispersing device, the detectors which 
convert electromagnetic energy into electrical signals, 
and the signal processing unit. The sensor is mounted on 






Figure 1.3 Sensor system block diagram. 
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a platform such as an aircraft or spacecraft. 

The operation of the sensor can be expressed mathemat- 
ically as the representation of the waveform x(A) by a set 
of functions (<{k(A)}. The original waveform is approxi- 
mated by the series expansion 

N 

*(x) ? I x.^.U) a.i) 

i=i 

where the 4 ^ (A) represent the spectral sensitivity (as a 
function of A) for one feature of the sensor system and the 
x^ are the coefficients in the expansion and the measurement 
values which will be used by the processor. Each x^ is 
obtained by using the linear functional 


x. 

1 


x( A) <fr. ( A ) dA 


( 1 . 2 ) 


1.1.3 The Processor 

The measurement values {x, , x„ , . .., x ) from the 

l z N 

sensor become the input to the processor which typically 
contains a digitally implemented classif ication algorithm 
A comprehensive list of all the processors that have 
been implemented would require a monumental effort to com- 
pile. Texts such as Nilsson (1965) , Fukunaga (1972) , and 
Duda and Hart (1973) describe some basic classifiers which 
may be adapted for specific applications. The point, here, 
is that the system designer and the analyst have great 
flexibility in choosing the processor. 


An important step in the design of a classifier is the 
training phase. After the classification algorithm has been 
selected, the parameters required by the algorithm must be 
determined in order to obtain good performance. The process of 

selecting these parameters is the training phase. To train 
the classifier a set of (presumably correctly) class-labeled 
samples from the scene are selected from which the necessary 
parameters can be computed. 

During the design procedure it is important to specify 
the performance of the system which implies that a measure 
of the performance must be defined. The global performance 
criterion, e , is a function of many system parameters. 

E 0 = f(a) (1.3) 

The list of system parameters is indicated by a. This func- 
tion is so complex that the straightforward analysis and 
subsequent optimization of the whole system with respect to 
is not trivial. What seems more appropriate is to list 
the parameters in order of importance and investigate the 
effect of each on the performance. 

Landgrebe (1978) has listed five general parameter 
categories which affect the system design. These are: 

- Spectral representation 

- Spatial representation 

- Signal-to-noise ratio 

- Ancillary data 

- Information classes 
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It is important to recognize that these parameters are inter- 
related; hence, a change in one parameter may affect the 
value of one or more of the other four. In this research 
we will be concerned primarily with the spectral representa- 
tion; however, the other four parameters will play a neces- 
sarily important role in the analysis. 

1.2 Previous Approaches to Sensor Design 

There have been basically three approaches to selecting 
spectral bands for multispectral scanner design. They are 
1) in-depth studies of physical considerations, 2) empirical 
methods, and 3) simulators. All three of these approaches 
have contributed to our knowledge of the scene and to the 
design and development of present-day scanner systems. 

Important physical considerations which have been 
studied are atmospheric effects and the interaction of light 
with various cover types. Atmospheric effects include 
scattering and absorption by water vapor, carbon dioxide, 
and ozone (Korb, 1969; Hulstrom, 1974). By evaluating the 
transmittance of the atmosphere over the spectral interval 
of interest, one can eliminate certain portions of the 
interval, since little or no energy will reach the sensor. 
Scattering effects are less pronounced but are important 
for consideration. 

Studies have been done to investigate the interaction 
of electromagnetic radiation with plant leaves to determine 
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regions of the 'spectrum which will be useful for identifying 
vegetation and determining plant. stress (Harnage, 1975; 

Gates et al, 1971). On a larger scale the interaction of 

* « 9 

light with a plant canopy has been studied which takes into 
consideration the effects of leaf size, plant size, and plant 
density (Colewell, ' 1974) . Similar studies have been done 
with soils (flay and Peterson, 1975; and Montgomery, 1976) 
and with water temperatures (Bartolucci, 1977). A typical 
procedure for these studies is to take measurements with a 
spectroradiometer on a restricted set of information 
classes over the entire spectrum. For a single observation 
a single spectral response function is recorded. The average 
response is taken over a small number of samples and con- 
clusions are drawn from the average. It is important to 
note that over a collection of these spectral response 
functions, the functions vary significantly about the mean. 
Furthermore this variation is potentially information 
bearing. This information is lost if one considers only 
the mean response function. 

The second approach is empirical in that a scanner 
with many spectral bands is constructed, and the selection 
of the bands is done experimentally. Examples of experiment- 
al scanners which have been constructed are the Michigan 
scanner (Kassel et al, 1974) and the USDS scanner (Zaitzeff 
et al, 1971) . The spectrum is sampled using on the order 
of 10-30 spectral channels (12 for Michigan scanner 24 for 
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MSDS) which are thought to be of interest. Data is collected 
using the scanner, and processing is performed to evaluate 
the channels over a variety of scenes. Some examples of 
empirical studies which have been done are Landgrebe et al 
(1977) with agricultural cover types, Coggeshell and Hoffer 
(1973) with forest covers, and Vincent and Thompson (1972) 
with geological applications. This empirical approach has 
the advantage of retaining the information in the variations 
about the mean since a large number of samples can be 
collected. However, the spectral sampling is crude and 
incomplete for representing the whole spectrum. 

Simulators have been developed to generate typical 
spectra according to a scene model. The artificial spectral 
response functions can then be used to evaluate spectral 
bands. A system which has been set up to simulate multi- 
spectral data is described by Malila et al (1977) . At this 
time there is not sufficient understanding of the scene to 
be able to develop and use accurate models. 

One additional research effort due to Wiswell (1978) 
which differs from the previous approaches deserves mention- 
ing. The purpose was to extract information from a scene 
using the entire spectral interval. The criterion of 
average mutual information was proposed which is a measure 
of the reduction in uncertainty about the scene after the 
observation has been made. This information theoretic 
technique was used to evaluate spectral bands on the basis 
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of maximum average mutual information. An autoregressive 
stochastic process model was used for the scene. Consider- 
able effort was expended in developing and testing this model 
the parameters of which were then used to compute average 
mutual information. While spectral bands were evaluated on 
the basis of the information criterion, the relationship 
between average mutual information and some global per- 
formance criterion such as classification accuracy was not 
demonstrated. 

In this research it will be desirable to incorporate 
the positive features of past approaches and build on the 
knowledge that has been gained through them. We would like 
to extract information from the entire continuous spectral 
interval of interest rather than from the coarse sample of 
the interval provided by experimental scanners. A large 
collection of spectral response functions taken from field 
measurements of the scene will be utilized in order to take 
into consideration the variability of the data over the 
scene as well as the average values. A parametric sto- 
chastic model will be assumed which has been well studied. 
The complete characterization will be learned from observa- 
tions of the real data. 

An important consideration is the choice of criterion 
upon which the sensor system will be designed. The choice 
of a global performance criterion, such as probability of 
correct classification, seems attractive, since the overall 


VJ! 

14 

performance of the pattern recognition system is ultimately 
what we wish to optimize. Usually one would like to 
maximize the probability of correct classification; howevbr, 
in most cases the integral involved is too complex to 
admit an analytic solution. Efforts have been made 
in this direction primarily drawing on results from the 
literature of feature selection. There have been three 
basic approaches along this line: 1) optimization of a 

separability measure, 2) discriminant analysis, and 3) 
principal component analysis. 

Separability measures of the statistical distance be- 
tween two class distributions are numerical quantities which 
are simpler to compute than classification performance and 
which provide bounds on the performance. The divergence 
(Marill and Green, 1963) and the Bhattacharyya distance 
(Kailath, 1967) are two well-known examples of separability 
measures. Wacker and Landgrebe (1971, Table 2 . 4 . 2) provide 
a listing of many of the separability measures that have 
been proposed in the literature. The approach is to select 
the set of spectral channels which is optimal with respect 
to the separability measure. Typically, a search procedure 
is used to arrive at the best choice of spectral channels 
(Tou and Heydorn, 1967; Whitsitt and Landgrebe, 1977; 

Kadota and Shepp, 1967; and Caprihan and deFigueiredo , 

1970) . Note that Kadota and Shepp (1967) and Caprihan and 
deFigueiredo (1970) are extracting information from continuous 


functions. Since the separability measure provides at best 
only a bound on the classification performance, it cannot be 
guaranteed that the channels which optimize the separability 
measure will necessarily optimize the system performance. 

In discriminant analysis one attempts to find some mea- 
sure of the ratio of the between class separability to the 
wi thin-class separability (Fukunaga, 1972; Foley and Sammon, 
1975) . The spectral channels are selected to maximize this 
ratio. One can observe intuitively that maximizing the ratio 
would improve the performance; however, it cannot be guar- 
anteed that the chosen set is optimum with respect to the 
global performance criterion. 

The method of principal components is a statistical 
procedure which reduces the number of variables to be analyzed 
to a manageable number (Anderson, 1962; Dempster, 1969) . 
Principal vectors are found such that the variable in the 
first principal vector has maximum statistical variance, and 
so forth. A variation on principal component analysis which 
has found considerable application is that of cannonical 
correlation (Dempster, 1969) . It cannot be assured that 
once the principal vectors have been selected, the global 
performance criterion is optimized. 

Although each of the feature selection procedures 
described above has been demonstrated to be practical in 
spite of the lack of a tight relationship to the global 
criterion, the approach that is proposed here will take 
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a different direction. Once the sensor has been designed, 
built, and placed into service, it should perforin optimally 
over all possible scenes for any possible choice of pro- 
cessor. The difficulty with the methods for optimizing 
the global criterion is that one does not know the specific 
processor and set of classes of a problem at the time the 
sensor is designed. Optimizing the choice of basis func- 
tions with respect to a global criterion for a specific set 
of classes in general may yield poor results with the same 
basis functions on a different set of classes. Furthermore, 
the global performance criterion was described as being a 
complex function of many parameters (1.3). If we choose as 
our criterion some measure of the quality with which the 
output of the sensor represents the input, we can optimize 
the criterion while holding parameters from the other four 
categories fixed. The relationship between the spectral 
representation criterion and the global performance criterion 
can be evaluated for typical remote sensing problems. 

1.3 Present Investigation 

In Chapter 2 a procedure is developed to analytically 
select spectral channels for a sensor system. The collec- 
tion of spectral response functions makes up the stochastic 
process. A representation technique based on the Karhunen- 
Loeve expansion which jninimizes the criterion of mean-square 
representation error is developed. This technique is 


generalized to include a priori weighting information which 
may be available. This weighting method has been termed 
the weighted Karhunen-Loeve expansion. 

The Karhunen-Loeve expansion is attributed to Karhunen 
(1947) and Loeve (1963) and is used extensively in the 
stochastic process literature as a technique for represent- 
ing stochastic processes (Davenport and Root, 1958? Wong, 

1971) . In the pattern recognition literature the Karhunen- 
Loeve expansion historically has been used as a feature 
selection technique (Watanabe, 1965? chien and Fu, 1967; 

Fu, 1968; Fukunaga, 1970; Kittler and Young, 1973) . 

The parameters and their influence on the global per- 
formance criterion are discussed in Chapter 3. The princi- 
pal parameter in this research is the spectral representation; 
hence, the relationship between the spectral representation 
parameter and the probability of correct classification is 
developed. The ancillary data, information classes, spatial 
representation, signal-to-noise ratio, and the interrelation- 
ships between these parameters are also discussed. 

An experimental software system which implements the 
procedure that was developed in Chapter 2 is described and 
evaluated in Chapter 4. Results from tests of the system 
are presented and discussed. 

In Chapter 5 some conclusions from the results are 
presented and suggestions are made for further work. 
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CHAPTER 2. SCENE REPRESENTATION AND SPECTRAL 

PARAMETER DESIGN 


In this chapter an analytical procedure is developed 
to perform the spectral parameter design for a sensor system 
bapable of operating as an integral part of any potential 
pattern recognition system. Due to the complexity of 
the scene, a stochastic process model is used to describe 
the scene. The theory necessary to support the procedure 
is developed for the case where the spectral response 
functions are square-integrable functions of the continuous 
parameter A. Due to practical consideration for measuring 
real data in the field and performing computations on a 
digital computer, a discrete approximation is developed, 
and the potential error due to the approximation is discussed. 

2.1 The Analytical Procedure 

Consider a pattern recognition system where the scene 
which is being observed by the sensor is some portion of 
the earth's surface S Q . It may be desirable to design a 
sensor such that S D is some subarea, for example, the land 
surfaces or a particular nation within its territorial 
boundaries. The area defined by the geographical boundaries 
of S 0 can be subdivided into areas called strata. We define 
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a stratum S c s Q as the largest contiguous set of points 
{seS} which can be classified to an acceptable accuracy 
with a single training of the pattern recognition algorithm. 

The sensor model will be a set of basis functions 
(tjn(A)} on the interval A (Figure 2.1). These functions 
are. essentially filters which have weighted passbands in 
differing portions of the spectral interval. The approxi- 
mation of a function x(A) by a set of four rectangular basis 
functions is illustrated in Figure 2.2. 

The processor for the pattern recognition system will 
be denoted by P(A, z, e Q ) , where A represents the set of 
algorithms used in the processor, z is the output of the 
system, and e Q is the system performance criterion with 
respect to z. The set A may include feature selection and 
classification algorithms. The output z may be a map, a 
table or some other presentation of the desired information. 

In order to define a remote sensing problem, the 
analyst decides upon an objective. Depending on the objective 
the analyst will specify the components of the pattern 
recognition system S " , {if; (A)},. and P(A, z, e Q ) . Quite 
often the objective dictates which subset, S'', of S Q will be 
used. As described in Chapter 1, the sensor (i|^(A)} is 
designed infrequently and once put into service remains 
fixed. The output z is based upon what information is 
desired from the scene to achieve the objective. 


Sensor 








Figure 2.2 Approximation of the spectral response 

function by a set of four basis functions 
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The ultimate goal before us is to select a sensor 
design (iJk(X)} which can be used on any subset S" of S Q 
and will provide, as nearly as possible, optimal performance 
for any choice of processor P(A, z, e Q ) . 

To achieve the design goal an analytical procedure 
is set forth which incorporates the design of a theoretically 
optimal sensor against which the performance of candidate- 
practical sensors can be compared. The following procedure 
is proposed (see Figure 2.3): 

1. Based on the intended use of the sensor system, the 
collection of strata comprising S q is specified. Because 
of the infinite number of possible strata in S Q , 
only a finite number G of subsets {S^} which are 
representative of the entire collection S Q will be 
used to evaluate the sensors. 

2. An initial candidate sensor system is specified 
by defining a set of basis functions {iJm(A)}. 

At appropriate steps in this procedure the set 
of basis functions may be modified to improve 
the performance. 

3. In steps 4 through 7 each stratum , i=l, 2, .., G 
will be considered in sequence. If it is necessary 
at any stage to modify i|^(A), the sequence should 
be repeated to insure that the desired performance 
is obtained over all of S Q . 


( 



Figure 2.3 Flowchart of the design procedure. 
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4. For design a sensor which will be optimum with 
respect to some criterion for any possible choice 
of processor. This optimal sensor {<I>^(A)} will 
serve as a standard by which one can compare the 
performance of the candidate system. 

■ -*f 

5. Evaluate the performance of the optimal sensor based j 

on the criterion of optimality over the stratum S^. :j 

6. Evaluate the performance of the candidate sensor j 

over the same stratum using the criterion. >■ 

7. Compare the performance of the candidate sensor 

with the optimal design relative to the criterion. J 

If the performance of the candidate sensor {i|^(A)} 
is very nearly the same as the optimum, then, the 
proposed sensor is adequate for the stratum under 
consideration. In this case the next stratum in 
the sequence is fetched and we return to step 4. 

In the event that all of the strata have been used 
the procedure halts. If the candidate system's 
performance is substantially below that of the 
optimum, it will be necessary to modify our choice 
of the set {^(A)} and return to step 6. The set 
of optimal basis functions { 4> ^ ( A ) } can be used to 
provide an aid for modifying {ijh ( A) } . 

The critical step in this procedure is the design of 
the optimum sensor, and will be the principal step to which 


this research will be addressed. The criterion for 
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optimality is an important quantity, and must be dealt with 
carefully. An optimality criterion has the dual role of 
providing the measure of performance which will be opti- 
mized as well as providing a standard for comparison of 
suboptimal systems (Middleton, Sect. 2.3.4, 1960). 

The optimal sensor system design will be optimum in 
the following sense. If one has the entire function x(A) 
at his disposal, a processor which is optimum with respect 
to a global performance criterion e Q may be designed. If 
x(A) is the approximation by the sensor to the waveform 
x( A) , then a fidelity criterion is defined by 

e = f (xU) - x(A ) ) dX (2.1) 

r h 

The condition for optimality requires that the original 

waveform be reconstructed with arbitrarily small e .■ 

There are several possible choices for the function f(-) 

in 2.1. It is desirable to choose a function for which there 

is a greater cost for large errors than for small errors . 

Since x(A) will be required to be a square-integrable 

function, a natural choice which satisfies the requirements 

2 

for the cost of making an error is the function f(x) = x . 
Equation 2.1, then, becomes 

x ( A ) ] 2 dA (2.1a) 


[ x ( A ) - 
A 


2.2 The Stochastic Process 


An experiment is defined as the observation of a point 
s in the stratum S. Each point seS is mapped into a 
spectral response function x(A). 

X: s -*■ x ( A ) (2.1) 


The function x(A) is a real-valued function of the continu- 
ous parameter A . 

Let a (A) be a non-decreasing function of bounded 
variation which is absolutely continuous. Construct a a- 
measure on the interval A such that 

w <*) ( 2 . 2 ) 


We require that x(A) belong to the Hilbert space L a of 

2 

all o-measurable functions for which the Lebesque-Stielt jes 

' ij ■ 


integral 


[x( A) ] 2 da ( A ) 
A 


(2.3) 


exists. The inner product which generates the metric for 
this space is 


(x,y) 


x(A)y (A)do (A) 
•'A 


(2.4) 


(Akhiezer and Glazman, 1961) . The norm is given by 
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|| X || = (x, x) ^ = ( [x(A) ) 2 dcx (A) 

A 


(Kolmogorov and Fomin, 1957) . 

Consider the subset L of L a which consists of the 
set of all possible spectral response functions which may 
be mapped from points in the stratum. The members of this 
subset {x(A), xeL } form an ensemble (Figure 2.4). This 
ensemble together with the probabilities of occurrence 
associated with the functions that belong to L specify 

b 

a stochastic process (Papoulis, 1965; Doob, 1963; Gikhman 
and Skorokhod, 1969) . 

Crane et al (1972) and others have shown that for 
remote sensing applications this stochastic process may 
be assumed to have a Gaussian probability measure. The 
Gaussian assumption is attractive because its mathematics 
are well-studied and tractable and because of its robust- 
ness. Robustness implies that good estimates of the density 
function can be obtained with a relatively small number of 
training samples and that statistical procedures on the 
process yield good results even for some non-Gaussian pro- 
cesses (Lachenbruch et al , 1972). An important property of 
a Gaussian process is that every linear function of x(A) eL q 
is a Gaussian random variable (Van Trees, 1968). Also, a 
Gaussian random variable is completely characterized by its 
first and second moments. The first moment or mean function 
of the process is denoted by 


2 



m(A) = E{ x( A ) } 


(2.5) 


and the second moment or covariance function is denoted by 

K(A,£) = E{ [x(A) - m(A) ] [x(S)-m(e)]} (2.6) 

where E { ) denotes ensemble expectation. The covariance 
is assumed to be continuous. 

2.3 Representation of the Stochastic Process 

The criterion that has been proposed for designing the 

spectral representation parameter for the sensor system 
is based on the ability of the sensor to represent func- 
tions belonging to the stochastic process. Of the possible 
techniques for representation of stochastic processes 
(Wong, 1971) , it would be desirable to choose a method which 
bears a close relationship to the physical model of the 
sensor. A well-studied technique is to represent the con- 
tinuous parameter stochastic process {x(A) , A e Lg} by a 
sequence of random variables which are the coefficients 
of a set of basis functions in a series expansion. The 
basis functions corresnond to the basis functions described 
for the sensor model in Figure 2.1. That such a representa- 
tion is possible without loss of information was shown by 
Bharucha and Kadota (1970) . 



be an infinite linearly independent set of functions 
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belonging to L . For an arbitrary function x(A) e L 

b v b 

we can associate the infinite series f J 


x(A) = x(A) = l x. <j) . (A) 
i=l 1 1 


(2.7) 


Note that for the series expansion the continuous parameter 

function x(A) is transformed to a point in the Hilbert 

space L whose coordinates are given by the vector of 
a 2 

T 

coefficients [x^, X 2 » ...] 

Without loss of generality the set (<{k(A)} will be 
taken to be orthonormal ; that is , 


(4> i ( A)r <t>j(A) ) 


1,'* 

(A) (A) da ( A ) 

(2.8) 

h 

(A) ( A ) w ( A ) dA 


1 , 

i = j 


0 , 

i 7* j 



If the set {(}>^(A)} is not orthonormal to begin with, it 
can be orthonormalized by the Gram- Schmidt procedure (Courant 
and Hilbert, 1953) . That such sets exist in Hilbert spaces 
has been demonstrated by the construction of sets such as 
complex sinusoids, Legendre polynomials, Tschebycheff 
polynomials and others. 
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The coefficients x. are the Fourier coefficients 

1 


defined by 


x.^ = (x ( A ) , ) 


(2.9) 


x(A) <p . (A)d a (A) 




x( A) <f> . (A)w(A)dA 


For a given set of basis functions the set of coefficients 

which minimizes the mean-square error between each function 

and its approximation are the Fourier coefficients (Courant 

and Hilbert, 1953) . Note that the set of coefficients 

(X. } can be treated as a vector X = [X , X , ...] . 

1 1 £ 

This vector representation of the function x(A) is a 
motivating factor in choosing this method of representing 
the stochastic process, since the vector representation 
provides an equivalent mathematical model to the physical 
sensor. 

Since the Hilbert space L has already been defined, 

a 2 

the corresponding definitions for the inner product and 

the metric follow. It is possible, therefore, to talk 

about a set {if>.(A)} which is complete in L and about the 
l a 2 

convergence of the sequence 

n 


V A > 


= I ( A ) 

i=l 


( 2 . 10 ) 
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to the function x(A). Convergence in L is convergence in 

s 

the mean. If 3 n (A) converges to x(A), then 

x(A) = l.i.m. 3 (A) (2.11) 

„ v n 


where l.i.m. is defined as 


lim 

n-H» 



[x ( A) 


Ai 9 

l X.<j>. (a) ]“ da ( A ) 
i=l 


0 


The problem of designing the optimal sensor becomes 
that of selecting the set of basis functions {<(> (A)) such 
that the series representation will be optimum with respect 
to the criterion. The criterion of minimum error in 
reconstructing a function is extended to the stochastic 
process where the expectation of the mean-square represen- 
tation error is taken over the ensemble 

E {e^eQ [x(A) - x ( A / ] 2 da ( A )j (2.12) 

We now propose a list of properties which would be 

*5 

desirable for the optimal design to have. Because it 
would be impractical to transmit an infinite or even a 
very large number of spectral channels over a data link 
to a processor as well as difficult for any processor to 
handle such volume, it is necessary that the representation 
of the signal space be characterized by a small number of 
dimensions. The series expansion provides a countable set? 
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hence, by ? truncating the series at some appropriate number 
of terms N a finite dimensional signal space is obtained. 
This finite dimensional space is only an approximation to 
the entire space L g , but it is desired that the approxi- 
mation be an adequate representation for L . 

D 

If we form the sequence { 3 ^}, where 
n 

B (A) = l x . <J> . (A) (2.13) 

n i=l 1 


it would be desirable that this sequence converge to x(A) in 
the mean-square sense. This convergence guarantees that the 
series can be made arbitrarily close to x(A) by increasing n. 
Another desirable property is that the convergence be 
rapid in the first few terms. One would expect that an 
increase in the number of terms in the expansion would 
reduce the representation error. It is desirable, though, 
that each additional term decrease the representation error 
by a maximum amount. A plot of the expected mean-square 
representation error as a function of the number of terms 
n would show a large decrease in the mean-square error 
for the first few terms with a considerably slower rate of 
convergence for higher order terms. If the expansion is 
truncated after N terms, the series 


V A) 


N 


l x. * . ( A) 
i=l 


should represent the function x(A) with minimum expected 
mean-square representation error. 
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We would also like the representation in terms of the 
optimal basis functions to be complete in the following 
sense. Let T be the minimum acceptable expected mean- 
square representation error. If the representation by 
the series expansion for some finite number of terms is 
such that 

E U r }= e|J [x(A) - B n (A) ] 2 daU)j < T (2.14) 


then the set of N basis functions will be complete in the 
N-dimensional subspace of L that has an expected error 

O 

less than T. 

The completeness of the set can be expressed in terms 
of the coefficients of the expansion. Squaring and inte- 
grating term-by-term the expression in 2.14 becomes 


E 


< E r> = 


E 


[x ( A ) ] da (A) 


> 1 - l 

J i=l 


E{ 


x. 


since, E { e }- 0 
r 


^ E{|x i ) 2 }< e[J [x{ A ) ] 2 da ( A )j (2.15) 


Inequality 2.15 is Bessel’s inequality and guarantees that 
the sum of the squares of the coefficients always converges. 
Furthermore, if there is equality, then Bessel's inequality 
becomes Parseval's equality 


00 . — 

1 E{ jx | 2 }= E [x(A)l 2 da(A) (2.16) 

i=l 1 J 




Z&ifrcxx: 


3/5 

I 


and the set (<J>^(A)} is said to be complete. The direction, 
here, is to find the complete set (<f>^(A)} and use only jj 
those terms in {<j> (A)} which provide a good approximation 
(E { e r }< T) in the mean-square sense. 

To obtain the optimal set of basis functions {<j>^(A^} 
some results from linear integral equation theory are 
required (Courant and Hilbert, 1953; Akhijezer and Glazman, 
19 61; Riesz and Sz.-Nagy, 1956; Lovitt, i'|924; Tricomi, 

1957; Ash, 1967), \ 

The linear integral operator ^ on L g is defined by 


% x(A) 


k('A, e) x(C) d a U) 

h 


(2.17) 


where k(A,£|) is the kernel of the operator. An operator 

is compact if for every bounded sequence of functions 

{X (A)}, the sequence of functions i x (A) has a con- 
n 1 m 

vergent subsequence. A bounded operator is self-adjoint if 
(t x, y) = (x, ^y ) 

We now state a theorem and some consequences of that theorem 
which will determine the set of basis functions (<(k(A)} 

Theorem ; If ^ is compact and self-adjoint, then the 
solutions to the linear homogeneous integral equation 

y i <f. i (A) = ^^(A) 


(2.18) 


is a set of eigenfunctions (<|k(A)} with corresponding 
eigenvalues y^. The following statements can be made: 

- The eigenvalues are real 

- The eigenfunctions form a basis for the space L 

s 

- The eigenfunctions for distinct eigenvalues 
are orthogonal 


- The series E x.<}>.(A) converges in mean-square 
i=l 1 1 

to x ( A ) . 

The covariance function K ( A , £ ) satisfies the necessary 
conditions on the kernel. Since the covariance kernel 
is Hilbert-Schmidt , 


2 \ f 2 

K( A , 5) | da ( A) da ( A) < E[x(A)] do(A) <- 


(2.19) 


it can be shown that the operator K is compact (Weston, 19 77). 
The covariance function is real and symmetric; hence, it is 
self-adjoint. If the covariance kernel is non-negative 
definite, the inequality 


p f 
■A- 


K(A,£) x ( A ) x(S) da ( A ) da(?) > 0 


( 2 . 20 ) 


is satisfied and the eigenvalues are non-negative. If the 
kernel is positive definite then the inequality is strict 
and the eigenvalues are non-zero and positive (Van Trees, 
1968) . 


The proof of the theorem draws upon results that 
have been well-established in the literature of Hilbert 
Space theory (Ash, 1967). 
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The random variables x ^ generated by the linear 
functionals are uncorrelated 


E(x. x . } = E 
i 3 


I x(A) <p. (A)dA x(£) <f> . U)d£ (2.21) 

J 1 J 3 




(A) 


K(A,£) (?) d? d<#> 


( / V 1 i=j 

= Y • <f> • ( A ) <j> . ( A ) dA =1 

] J 1 11 L 0 i^j 


If x(A) is a Gaussian process, then the coefficients 
x^ are independent Gaussian random variables (Ash, 1967) . 

It is possible to order the set of eigenfunctions 
{(ji^(A)} such that the sequence a n (A) for n = N, fixed, 
minimizes the expected mean-square representation error. 

To accomplish this ordering, the corresponding eigenvalues 
are ranked such that 


Y 1 " Y 2 “ Y 3 “ * * ‘ * 


The expected mean square error for N terms is 
(Brown, 1960) 


- 


-00 - 

2 

E { e } | . =E 

r 1 n=N 

•A 

I x .<(» .( A ) 

L i=N+l -j 

da ( A ) 


( 2 . 22 ) 


= l Oil ) 

i=l 


(A) da ( A) 


i=N+l 
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2 

since from equation 2.21 E{|x^[ } = y^. The graph of the 

expected mean-square error as a function of the number of 
terms N in the expansion will show a sharp decrease in the 
error as the first terms are added. As an example consider 
the second-order stochastic process on the interval [-1,1] 
with mean zero and covariance K(A,C) = exp( -|x-c[). 

The eigenvalues are given by Van Trees (1968, p. 188) 

Y = __2 (2.23) 

1 l 

where the b^ are solutions to the equation 

(tan b. + b. ) (tan b. - JL) = 0 (2.24) 

ii ib. 

l 

A graph of the expected mean-square error for this process 
as a function of the number of terms (Figure 2.5) illustrates 
the desired rapid convergence property. It is important 
to note that for a fixed N the best set of N basis function 
from the set of all possible basis functions is the ordered 
set (<t>^ (A)}, i = l,2,...,N. 

r In an effort to present an intuitive interpretation of 

the eigenvalues and eigenfunctions consider the first 
eigenvalue and its corresponding eigenfunction for a partic- 
ular stochastic process. The first eigenvalue is found 
by choosing a function <j> ( A ) which maximizes the variance 
of the coefficient of that function. That is, the co- 
efficient is given by the linear functional 



40 



x ( A ) <j> (A)da(A) 
'A 


(2.25) 


The variance of x^ is equal to the first eigenvalue 
(2.21) which was chosen to be the largest of the set of 
eigenvalues. Since the variance is the largest for the 
coefficient x^, the uncertainty about the original function 
x ( A ) is reduced the most by using the first term. From 
a Shannon information theory point of view (Shannon, 1948) 
knowing the value of the coefficient x^ provides the most 
information concerning the input signal that a single 
measurement can give. From the argument of being able to 
reconstruct the waveform, the coefficient gives the 
single most valuable measurement from which the input 
signal could be reconstructed. 

The first eigenfunction can be used to identify por- 
tions of the spectral interval which may be more useful 
than others. If at a point A on A the value of <f>^ (A) 
is close to zero, then the contribution to is not signifi- 
cant. On the other hand, if at a point A, <j>^(A), is 
significantly different from zero, then,, the spectral 
response at that point may be of importance. 

The second eigenvalue and eigenfunction attempt to 
find the second most useful portions of the spectral 
interval. The variance of the second coefficient x^ is 
the second largest since the eigenvalue is the second 
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largest. The third eigenvalue and eigenfunction corres- 
pond to the third most useful and so forth. Therefore, 
there is an ordered sequence of eigenfunctions <P^(A), 

<t> 2 (\) , ...whose corresponding coefficients x 1 , x 2 , ... 
provide a decreasing amount of information and a decreas- 
ing contribution to the reconstruction of the original 
function x(A). Based on the ranking, the eigenfunctions 
provide some intuitive indications concerning the importance 
of the points in the spectral interval. 

A useful concept when discussing a signal set is the 
dimensionality of the signal space. The dimension of a 
signal space can be defined as the minimum number of basis 
functions required to completely reconstruct any function 
from the ensemble (Bennett, 1969) . The orthogonal 
expansion which we have just derived provides an approxi- 
mate method of determining the dimensionality of the obser- 
vation space. If T is the value of expected mean-square 
error such that the approximate representation using only 
enough eigenfunctions to reduce E(e) to a level below T , 
then the number of eigenfunctions is a reasonable approx- 
imation to the dimensionality of the signal space. 

A general method has been developed for obtaining an 
optimal set of orthonormal basis functions such that 
if we choose an acceptable value of expected mean-square 
representation error E{e), the series expansion can be 
truncated at some finite number N which will represent 



any function in with E{iv^_} less than the required 
value. Built into this derivation is the capa- 
bility of adding a priori information which may be avail- 
able concerning the spectrum. If we let w(A) =1.0 for 
all A e A. the series expansion is identical to the 
Karhunen-Loeve expansion derived in many texts (Davenport 
and Root, 1958; Van Trees, 1968; Middleton, 1960). When 
the weighting function is unity for all A, the expansion 
will be referred to as the unweighted Karhunen-Leove 
expansion. 

Due to measurement difficulties in water absorption 
bands and differences in detector characteristics it has 
become apparent that the use of a weighting function dif- 
ferent from the uniform one used above may be advantageous. 
The use of the weighted Karhunen-Loeve expansion has 
appeared only briefly in the literature (Kailath, 1971; 
Kailath, 1974). It is thought that the lack of wider use 
for the weighted Karhunen-Loeve expansion is due primarily 
to a lack of need for it until this time. The weighted 
Karhunen-Loeve expansion will be used extensively in the 
results to be presented later. 

2.4 Discrete Approximation 

It is proposed to solve the optimal sensor problem 
described in the previous section on a digital computer 
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using real data taken in the field. However, the solution 
must be approximated in order to take into consideration 
some practical constraints. First, the spectral response 
functions are not available as square-integrable functions 
on the dense set {X e A}. The functions are obtained in 
the .field by sampling the spectral response} with an instru- 
ment that uses very fine spectral windows. Secondly, the 
parameters of the process are not known a priori; hence, it 
is necessary to estimate the mean and covariance functions 
using a representative sample from the ensemble. Finally, 
because the data will be stored and processed digitally 
it is necessary to quantize the amplitude of the response 
at each of the spectral sample points. Each of these con- 
straints can potentially contribute to the representation 
error for the process. In this section we want to consider 
the significance of the error due to spectral sampling, 
ensemble sampling and quantization. 

2 . 4. 1 Spectral Sampli; q 

Up to this point the spectral response functions have 

been treated as functions of the continuous parameter A. 

Suppose that the function x(a) is sampled at L intervals. 

Each spectral response function then becomes a vector u = 

T 

[u^, u 2 ' U 3 ' •••' u l^ * would be desirable to use a large 

enough number of sample points such that the error intro- 
duced by sampling the spectral interval is not significant. 
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Note that once the sampling has been done the dis- 
crete equivalents to the solutions of the eigenvalue problem 

V) 

are used. The linear integral equation becomes 

r 4> = KW4> ) (2.26) 

"here is the LxL matrix of eigenvectors, K is the LxL 
covariance matrix given by 

E{ [ Ui - U.J [u. -u ] } (2.27) 

u^ = E { u^ } 

r is the diagonal matrix of L eigenvalues, and W is the 
diagonal matrix of L weighting coefficients. 

Because the actual covariance function is unknown, the 
loss of information or representation error from sampling 
cannot be evaluated. However, we can derive an expression 
that gives some insight into the effect of the error due 
to sampling. To evaluate the error due to sampling, the 
interval over the random process with mean m(A) and co- 
variance k(A,£)r the interval A is partitioned (Figure 
2. 6) into L equal intervals with L + 1 end points A^. 

Define a set of sampling functions by 

1 

V— zr ~ _ , A < A < A. 

0 (A) =^AW(A) 1-1 1 A e A (2.28) 

0 , elsewhere 

where A A = A^ - A^_^. The waveform x(A) can be approximated 


by 






Using the expression for mean-square representation error, 

( 2 . 30 ) 


e S = 


A 


[x( X) - x L ( A) 3 da ( A) 


the form for the expected error due to spectral sampling 
can be derived. The coefficient for the ith sampling 
function is given by 


x i 


x ( A ) 0 . { A ) da ( A ) 


( 2 . 31 ) 


A 


The expected mean square error due to sampling is 


E{ Eg } = E ^ J [x ( A ) - x^ ( A) ] ‘‘‘da ( A ) 


■] 


( 2 . 32 ) 


= f I< ( A , A ) da ( A ) - eF l [x. -m. ] 2 + f m 2 (A)da(A) - J m 2 
•'A Li=i 1 1 J •'A i=l 1 

where m(A) =E{x(A)} and nr = E{x^}. If the number 


of intervals L approaches infinity, 

r L 


' n j 1 f 

lim E l (x.-m.) = K(A,A)da(A) 

Li=l 1 1 -1 ■'A 


( 2 . 33 ) 


and 


lim l m7 = 
L-v«> i=l 1 


m ( A ) da ( A ) 


( 2 . 34 ) 


and the limit for the expected error is zero. 
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The discrete version of the Karhunen-Loeve expansion 

can be evaluated using the sampled spectral response 

functions. Let y. = x. - m, , then 

J 1 1 i' 


Eiy ± } = 0 


(2.35) 


E{y .y . } = k. . 
■'r i 13 


The discrete form of the integral equation is 


Vl = KW *L 


(2.36) 


where T t is the diagonal matrix of eiaenvalues y_ and <J> 

JU ** ii 

is the matrix of eigenvectors tj> T for the L sampling 

L i 

intervals. Therefore, 


E .1 (x ± -m ) 2 = l y 
1 = 1 1 = 1 


(2.37) 


Hence, the expected error is the difference between the sum 
of the eigenvalues for the unsampled covariance function 
and the sum of the eigenvalues for the covariance matrix 
plus the difference between the integral of the mean 
function squared and the sum of the squares of the 
elements of the mean vector. 


EU S 1 


= l 

i=l 


L 

n- 7 


i=l 


m' ( A) da (X) 


A 


L 

- I 

i=l 


2 

m i 


(2. 38! 


As an example consider the second-order zero-mean 
process described earlier with covariance K(A,5) - exp(- 1 A-^j) 



Suppose the interval [-1,1] is partitioned into 20 sub- 
intervals and the eigenvalues for the 20 dimensional system 
is computed. The expected mean square representation error 
due to sampling is: 


E{e S } * L=20 


•£ 


dA- l t 
i=l 


( 2 . 39 ) 


= 2.0- l y 

i=l 1 


The first ten eigenvalues for the continuous covariance 
and the sampled covariance are listed in Table 2.1. 

Table 2.1 Eigenvalues for continuous and sampled covariance 



EIGENVALUES . 



CONTINUOUS 

SAMPLED 

1 

1.149 

1.149 

2 

.391 

.390 

3 

.157 

.156 

4 

.080 

.078 

5 

.047 

.046 

6 

.0 31 

.029 

7 

.022 

.020 

8 

.016 

.015 

9 

.012 

.011 

10 

.010 

.008 


The expected mean-square error due to spectral sampling 
for 20 terms is 0.065. Depending on the form of the co- 
variance function and the mean function, one can choose a 




49 




8 

| 


i 



sufficient number of samples L to reduce the error to a 
negligible value. 


2.4.2 Ensemble Sampling 

Ideally, one would have available the complete ensemble 
from which the stochastic process could be accurately 
characterized. Unfortunately, a complete ensemble may 
require an infinite number of sample response functions 
since there are an infinite number of points in a stratum. 

A reasonable alternative is to select a representative 
sample from the ensemble from which the unknown parameters 
may be estimated. In sampling the ensemble one is con- 
cerned with the number of samples that are needed and how 
the sampling is done. By a 'representative' sample it 
is implied that the sample functions are taken from all 
typical observations in the stratum. 

The number of samples required to adequately estimate 
the eigenvalues and eigenvectors can be evaluated in a 
straightforward manner. Using perturbation theory 
(Wilkinson, 1965) , a first order approximation to the 
estimates of the eigenvalues and eigenvectors can be 
derived as functions of the covariance estimate 


Y i 


T A 

> 7 K <j> . 
l Y l 





T * 

<J) . K <f> . 
1 

,Y rg> 




(2.40) 

(2.41) 




where and <j>^ are estimates of the eigenvalues y^ and 
eigenvectors <j>^ respectively. These estimates are 
approximately unbiased (Fukunaga, 1972) since 


E {y j _} = -{<}>? K 4 » i > = <j>;F Ktj> i = Y;l 


(2.42) 


and 


E (<j> . } 
1 


) . + 
x 


L 

I 

3=1 L 
j=i 


r . T 


i E {K> <P • 


(Y ± “ Y j ) 


3 T i 


(2.43) 


The variances of the estimates are expressed by 


Var [y ± ] = E {y ± ~ y^ 2 } ~ E { ( <)>T K <|)^) 2 - y 2 


(2.44) 


and 


Var [t ± l = E (||$ i - <f>..|| 2 } ~ 


L E (<{>. K - ) 
V 

j=l (Y i“ Y j ) 

j^i 


(2.45) 


T * 2 

The term that must be evaluated is E {(<j>_^K<j>.) . The 

derivation follows that of Fukunaga (1972) from which the 

result is shown to be 

m 2 (N -1)N ~ N ? 

E {{*: K *.r}= — 2 5 S.yf.fi.. + 2 yf 6 (2.46) 

1 3 (N -ir 1 13 (N_”l ) 2 1 ^ 


N, 


2 Y i Y j 


(N S" 1) 
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Where N is the number of sample functions in the ensemble 

and 6. . is the Kroncker delta. Now the variances can be 
il 

written 


Var [ y ] 


4 V 1 

<v 1)2 


(2.47) 


and 

N N y . y • 

Var [}.] = I y “ - 7 (2.48) 

j=l (N -1) (Y.-Y.) 

j*L S 10 


Note that the variance of the eigenvalue is proportional to 
the square of the eigenvalue. The variance will decrease 
as the number of samples is increased and asymptotically 
approaches zero as N approaches infinity. Since the 
expected mean-square error is a function of the eigenvalues 
the estimate of the error is also asymptotically unbiased. 

The variance of the eigenvectors is very large when 
two eigenvalues are close together. If all of the 
eigenvalues are well separated the variance of the 
eigenvectors approaches zero as N approaches infinity. 

vD 

2.4.3 Quantization 

Quantization of the amplitude of each element in the 
output vector is necessary for subsequent data transmission, 
storage and digital processing. A Q-level quantizer divides 
the amplitude range into Q equally-spaced intervals. The 
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ith interval has end points x^ and x^ + ^ and output level 
equal to (x^ + ^ + xj /2 . 0 . The expected mean-square 
error due to quantization is given by Max (1960). 


Q-l f i+1 


*‘V-,U 


i=0 •'x. 


(x - yj p ( x) dx 


where p(x) is the probability density function for the 
random amplitude x. 

Suppose the interval corresponding to the amplitude 
range has length I and is divided into O subintervals. 
Assume that the probability density function is very 
small outside I. The length of each subinterval is 
the ratio I/Q. An upper bound to E { e } can be found 
easily by noting that 

l 2 

2 i 2 

(X- Yi ) < (|l) = -f (2. 


Q r X i+l 


.1 f 

i=l x 


p(x)dx = 1 


Therefore, 


EtE g ) 


By keeping the length L^. reasonably small the quantization 
error will not be significant. 
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If the probability density function has a significant 
portion of the function outside the designated amplitude 
range the expected error may increase substantially. The 
quantizer will assign the value y^ to all values of x 

greater than x„ , and y to all values of x less than x, . 

Q o 1 

If x is outside the amplitude range saturation will occur. 
The mean-square error will increase significantly if this 
situation occurs. 

The total expected mean-square representation error 
is a function of the errors due to truncation, spectral 
sampling, ensemble sampling, and quantization. It lias 
been demonstrated that the error due to quantization is not 
significant. In fact the uncertainty in the measuring 
devices is considerably greater than the uncertainty due 
to quantization. Since the covariance and mean function 
are not known apriori it is not possible to evaluate the 
expected error due to spectral sampling. However, it was 
demonstrated that for a known case the number of samples 
required to reduce the error to a negligible value was 
not unreasonable . Therefore, in the experimental work 
it will be assumed that the expected error due to spectral 
sampling will be sufficiently smaller than the average 
error in the measuring system. 

Since the estimates of the eigenvalues and eigenvectors 
are unbiased, it is expected that the corresponding error 
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due to the fact that only a finite set of sample functions 
was available would be small, especially if the number of 
sample functions was sufficiently large. 

The principal source of error which will be considered 
will be the error due to truncation. Hence, provided that 
some care has been taken with regard to the number of 
spectral samples, number of sample functions, and the 
length of the quantization intervals, the approximation 
to the continuous case is not unreasonable. 


I 

f- 



CHAPTER 3. THE PARAMETERS AND OVERALL 


SYSTEM PERFORMANCE 

\ 

The intent in choosing a particular sensor design is 
to optimize the expected performance of the pattern recog- 
nition system with respect to the global performance 
criterion e . The quantity is a complicated function 
of a set by parameters a. By varying the parameters a 
search can be made to find the best combination to 
optimize e . The first step is to list the parameters. 

Five parameter categories were listed in Chapter 1: 

- spectral representation 

- spatial representation 

„ -4 

- signal-to-noise ratio 

- ancillary data 

- information classes 

The problem is to quantify these parameters categories such 
that an optimization procedure can be applied. As a preliminary 
step, it is proposed to consider each category individually 
and study the relationship between that category and the 
global criterion. The other parameters will be held 
constant while allowing the parameter under investigation 
to vary. It is also important to understand the inter- 
relationships between the parameters ; a change in one 


m 




j 
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parameter may influence the performance criterion both 
directly and indirectly through another parameter. 

The primary focus, here, is on the spectral represen- 
tation parameter and its corresponding quantity, mean-square 
representation error, e^. In Chapter 2, an analysis 
procedure was developed in which an ordered sequence of 
basis functions allows the spectral response function to 
be represented with decreasing expected mean-square error. 

It remains to show the effect of the spectral parameter 
on the overall system performance. This chapter will 
first deal with the relationship between e q and e^, followed 
by a discussion of some research results relating other 
parameters to e q . 

There are a variety of processors which can be used 

to evaluate a data set depending on the nature of the 

problem. Typical processors include separability computers, 

linear classifiers, quadratic classifiers, non-parametric 

classifiers and context classifiers. We will choose the 

maximum likelihood Gaussian classifier as an example of 

a quadratic classifier which will be used as representative 

processor for evaluation of the pattern recognition system. 

Let X be an observation from one of M classes C, , 

i 

i=l,2, . . . , M, with a priori probabilities P^. The maximum 
likelihood decision rule can be stated as follows: Assign 

X to the class if 
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p k p < x l c k ) = max { p .p(X |C. ) } (3.1) 

i 

where the p(x|c\) are the class conditional probability 
density functions. If fl is the observation space, then this 
rule partitions ft into the subspaces ft , ft^, . . . , ft M corre- 
sponding to the classes C^, , . .., C , respectively. 

The probability of correct classification has found 
widespread use in the pattern recognition and remote 
sensing community, and will be used here as the system 
performance criterion. For a multivariate, multiclass 
pattern recognition problem the probability of correct 
classification is defined as 

p = | max ( p . p (Xlc. ) } dX (3.2) 

c 'ft i 1 1 

where p(x|Ci) is the conditional jointly Gaussian probability 
density function for class i. 

3.1 Spectral Representation 

The expected mean-square error, E {e^}, has 
been used as a measure of the fidelity of the spectral 
representation. The Karhunen-Loeve expansion has been 
developed as a means of representing the spectral response 
functions in the ensemble by a finite series expansion 
such that E { } is minimized. We wish to study the 
relationship between the spectral representation and the 
performance of the overall pattern recognition system. 
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If the stochastic process is completely known, 

and if all the terms in the Karhunen-Loeve expansion 

are used in the representation, then a decision scheme 

exists which is optimal in the sense of maximizing the 

probability of correct classification. For the M-class 

pattern recognition problem an experiment is defined 

whose outcome is the vector X belonging to the set of all 

possible outcomes. A decision scheme is realized by 

partitioning the observation space into M regions such 

that if X belongs to then the decision, 'X belongs 

to class C.,' is made. Such a decision scheme can be 
1 

arrived at by evaluating the a posteriori probabilities 
for each class. The a posteriori probability (P (Ch | X)) 
is the conditional probability that class Ch occurs given 
that the measurement value is equal to X. If the vector 
X is finite dimensional, then it is straightforward to 
evaluate the a posteriori probabilities and, using equation 
3.1, to design a classifier which is optimal in the sense 
of .maximizing the probability of correct classification 
(Anderson, 1958) . 

This approach has often been generalized to the 
case where the vectors are infinite dimensional and the 
outcomes are real functions x(^) on an interval (Grenander, 
1950; Kadota, 1964, 1965; Van Trees, 1971). The procedure 
begins by representing observed sample function in the 
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ensemble by a finite vector [x^, x^ , x^] of 

coefficients which are the coefficients in the Ka.rhunen- 
Loeve expansion. The a posteriori probability for each 
class is constructed and the limit as N approaches 
infinity of the conditional probability P(CL |x^, . .., x^) 

is taken. Bharucha (1969) has shown that this limit exists, 
and furthermore, that the resulting decision scheme opti- 
mally partitions the observation space such that the 
probability of correct classification is maximized. 

We now consider the implications of the constraint 
that the number of terms in the expansion be finite 
has on the classification performance. If N features are 
used, it cannot be guaranteed that the first N features 
are the best for discriminating between M classes in a 
particular remote sensing problem (Foley and Sammon, 1975) . 

A simple example has been used to demonstrate this fact. 
Suppose there are two features and it is desired to use 
only one feature to discriminate between the two classes. 

Let the classes be distributed as shown in Figure 3.1. 

Based on the criterion of minimum mean-square representa- 
tion error the basis function <j> should be chosen. However, 
it is obvious that <p is the better choice for discriminating 
between the classes. Hence, if say ten terms are used in 
the representation, it may be true that the 34th term, for 
example, is superior for discriminating between classes 
than some of the first 10 features. 



Figure 3.1 Two distributions which demonstrate a potential 
difficulty in using the best feature for 
representation to perform classification. 
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We wish to develop, here, the relationship between the 
expected mean square error for an optimal set of basis func- 
tions and the probability of correct classification. If the 
stochastic process is completely known, a decrease in the 
mean-square representation error does not result in a de- 
crease in the probability of correct classification. The 
addition of a measurement or feature does not decrease the 
separability. If the added measurement contributes only 
noise, then the separability of the distributions is the 
same as without the added measurement. This monotonicity is 
implied in the convergence of the a posteriori probabilities 
as N approaches infinity. 

The intent here is to have as small a value of E{e } as 

r 

possible or at least drive it well below the average measure- 
ment noise. Every decrease in E{e } is known to not decrease 

r 

p . Returning to the example, we would not choose only one 
c 

feature if we could help it, but rather choose to keep both 

features since this would reduce E{e } to zero for the two- 

r 

dimensional case. 

If the probability of correct classification is plotted 
as a function of the expected mean-square representation er- 
ror as sketched in Figure 3.2, some important insights into 
the nature of the data can be gained. We know that as the 
expected error decreases the classification accuracy does 
not decrease. The monotonicity is indicated by the solid 
line in the figure. We wish to observe the behavior of the 


Probability of correct classification 



relationship between the expected error and the classi fica- 
tion performance as E{e^} becomes small. It may occur that 
at some point P, a large decrease in the expected error 
results in little or no change in the classification per- 
formance as indicated by the dashed line (a) . In this case 
the number of terms required to represent the process with 
an error of T is sufficient for the information classes 
chosen. One may be able to evaluate the portions of the 
spectrum which are of most value based on the first few 
eigenvectors. Also, in this case one can estimate the 
maximum classification performance that can be achieved by 

noting the value of P that the graph is approaching as the 

c 

expected error becomes small. 

Suppose, however, that at point P a small decrease 
in the expected error results in a significant improvement 
in classification performance as indicated by the dashed 
line . In this case more terms are required to attain 
the maximum discrimination capability. Also the eigenvec- 
tores which correspond to the largest improvements in 
performance can be analyzed to determine which spectral 
regions are contributing the most. 

Several times in this discussion the condition that 
the process be completely known was stated. If the process 
is not completely known, but must be estimated from a finite 
data set then the situation becomes different. The effect 
of a finite data set size is now discussed. 
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3.2 Ancillary Data 

Ancillary data is information other than the spectral 
response functions themselves associated with a stratum 
which has bearing on the performance of the system over 
that stratum. An example of ancillary data which is 
important for this research is the design set. Sample 
response functions drawn from the ensemble are used to 
design the classifier. For a maximum likelihood Gaussian 
classifier the design procedure is to estimate the mean 
vectors and covariance matrices for each class from the 
design set. 

For a fixed number of features or dimensions, it is 
well known that if the design set is used to test the 
classifier performance, the estimate of probability of 
correct classification will be optimistically biased 
(Fukunaga, 1972; Toussaint, 1974). That is, the estimate 
is better than the true performance. If a test set, con- 
sisting of sample functions from the ensemble different 
from those in the design set, is used, the performance 
estimate is inferior to the true performance. If the 
number of sample functions is increased the estimates 
of classification performance both approach the true 

performance. If the number of sample functions N approaches 

s 

infinity, the probability structure will become completely 
known and the true performance can be evaluated. 
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Now, consider the case where the number of sample 
functions is fixed while letting the number of features 
be a variable. If N^ is infinite, increasing the number 
of features from N to N+l will either improve the per- 
formance or the performance will be the same for N and N+l 
features. However, if N g is finite, increasing the number 
of features may have an adverse affect on the performance 
estimate . 

Three research results have been published which 

attempt to determine the relationship between the design 

set size and the number of features. One of the first 

attempts to quantify and explain this relationship was done 

by Allais (1964) . The study involved the linear prediction 

problem which is closely associated with the linear two- 

class pattern recognition problem. Allais showed both 

analytically and experimentally that for a fixed N , 

s 

increasing the number of measurements improved the per- 
formance for a while until a certain peak was reached, 
after which the performance deteriorated drastically. 

A second research result reported by Hughes (1968) 
showed the same peaking for mean recognition accuracy as 
measurement complexity is increased. The mean recognition 
accuracy is the average over all discrete non-parametric 
probability structures of the correct, recognition 
probability using the Bayes recognition rule. The measure- 
ment complexity is the total number of discrete values and 


66 


is equal to the product of the number of features and the 
number of quantization levels. Hughes argues that increas- 
ing the measurement complexity necessarily means that 

there are fewer samples, N , per measurement cell available 

s 

to estimate the probabilities associated with each cell. 
Hence, when the classification accuracy is computed 
using these cell probabilities, the average classification 
accuracy will decrease as the number of features increases 
if the design set size is too small. 

A third research result is due to Foley (1975) 
who studied two-class multivariate Gaussian pattern recog- 
nition problems with different means but identical co- 
variances. An analytical expression was developed to 
determine what the ratio of design set size to feature 
size should be to obtain a good estimate of the performance 
of the classifier. A ratio of 3-to-l was considered to 
be a good engineering rule-of-thumb for choosing the number 
of features for a given sample size. 

These results have been somewhat controversial and 
often misinterpreted, especially the work by Hughes, and 
have frequently been discussed in the literature (Kanal 
and Chandrasekaran , 1971; Abend et al, 1969; Chandrasekaran , 
1971; and Chandrasekaran and Jain, 1974, 1975) . 

The underlying cause of the influence of sample size 
is due to the statistical uncertainty that occurs in 
estimating the statistics for the classes. As sketched 


67 


in Figure 3.3 the positive bias in the performance estimate 
when testing on the design set increases with the number 
of features used. This bias is due to the cumulative 
effects of the uncertainty in estimating the statistical 
parameters (Chen, 1978) . When the test set is used to 
evaluate the performance the bias decreases as more features 
are added. The end result is that a positive bias becomes 
significant at some point determined by the sample size 
for the estimate on the training set. A degradation in 
the performance occurs at the same point for the estimate 
based on the test set. 

A concept which is brought out in much of the litera- 
ture dealing with the relationship between feature size and 
design set size is that the more a priori knowledge 
about the_.under lying probability structure that is available 
the more features that can be used with a given data set 
size (Foley, 1972) . Conversely, for a fixed number of 
features, added knowledge of the probability structure 
allows one to reduce the number of design set samples 
collected (Mogera and Cooper, 1977) . As an example, the 
fact that the probability densities are assumed to be 
Gaussian implies that fewer sample functions are required 
to get good estimated of performance than if no parametric 
assumption was made. 


Figure 3.3 The effects of sample size on classification 
performance as a function of the number of 
features,, a) true performance, b) Positive 
bias in P c due to testing on the design set, 
c) Negative bias in P c due to testing on 
the test set, d) estimate of P c when testing 
on the design set, e) P c for testing on the 
test set. 
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3.3 Information Classes 

An information class refers to the label assigned 
to the sample points in the stratum. The labels are chosen 
to be meaningful in the context of the pattern recognition 
problem under consideration. Since we are looking for a 
sensor which will work well for a variety of pattern 
recognition problems, we consider the influence of the 
choice of information classes on the overall performance 
criterion for the pattern recognition system. 

We first note that there is an intrinsic set of classes 
which is associated with each stratum. For example, in some 
strata the class list may consist of primarily vegetation 
classes; whereas, in other strata urban classes may be 
predominant. For each stratum a non-unique hierarchial 
tree structure may be constructed (Figure 3.4) (Landgrebe, 
1978) . To construct the information tree it is important 
to remember that the class list must be exhaustive; that is, 
every point in the stratum must be assigned to one of the 
classes. The choice of the class labels depends on the 
informational value that they have to the user. At the 
top of the tree the classes are easily separable using 
few features. As one selects class sets which are deeper 
in the tree structure, it becomes increasingly more diffi- 
cult to discriminate between the classes . 

An example using artificial data can be generated 
which demonstrates the effect of the choice of information 
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Figure 3.4 An information tree for a typical stratum (Landgrebe, 1978) 
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classes on the probability of correct classification. As- 
sume that the data is two-dimensional and that a tree 
structure can be drawn as follows: 



where I and II denote the first level classes and a, b, c, 
and d denote second level classes. Let the mean vectors 
and the covariance matrices for the four classes be 


class 

a 

M = 
—a 

10.0 

11.0 

K = 
a 

3 

3/2 

3/2 

5 

class 

b 

\ ■ 

10.5 

11.0 

5, = 

r 

2 

1/2 

1/2 

3/2 

class 

c 

ri = 

— c 

15.0 

9.0 

K = 
c 

8 

5/4 

5/4 

10 

class 

d 

Md - 

14.5 

9.5 

K a - 

6 

1 

1 

4 


Plotting 20 random points from each of these distributions 
in two-dimentions gives some idea of the four distributions 
(Figure 3.5) . 



The performance can be evaluated for the first level 
by combining the statistics assuming that the four classes 
are equally probable. A performance estimator was used 
to evaluate the probability of correct classification from 
the known statistics. The overall probability of correct 
classification at level one is 0.91 whereas the overall 
probability when attempting to discriminate between the 
four classes is 0.59. One can readily see from this example 
that the choice of classes will effect the overall per- 
formance criterion. 

Recalling the graph of the classification performance 
as a function of expected mean-square error,, a different 
set of information classes may alter the graph significantly. 
In general, information classes that are deeper in the 
information tree will require smaller representation error 
to achieve a specified classification performance. 

Kulkarni (1978) provides further discussion of the per- 
formance of a classifier as a function of the design set 
size, the measurement complexity, and the depth of the 
information tree. 

One can also observe that the information classes 

present in a stratum influence the selection of the 

optimum set of basis functions (<f>^(A)}. Let each 

class have a Gaussian probability density with mean 

function m. (A) and covariance function K. (A,E) , i— 1,2, 
i l 

... , M. The covariance function for the stochastic 
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process can be written as a function of the class condition- 
al mean and covariance functions. 

K(A,0 = E{ (x ( A ) -m(A)) (xU) -m(U)} 

M 

where m(A) = 7 P.m. (A) 

i=l 1 1 


M M 

K(A,C) = l P k K k (\,Z) + l 


k=l 


k=l 


"k 


(A) 


M 


~ I 


i=l 


P.m. (A) 
i i 


M 

m, (4) - l P.m. U) 

K .,11 

1 = 1 


For the special case where M=2, this equation reduces to 


K(A,£) = P 1 K 1 (A,5) + P 2 K 2 (A,C) + P 1 P 2 tm 1 (A) 

- m 2 ( A) ] [m U) ~ m 0 ( ?) ] 

Recall that K(A / ?) is the kernel of the integral equation 
which is solved to obtain the optimum set of basis func- 
tions { cf> , ( A ) } . Hence, the information classes determine 
the values of the mean and covariance functions and their 
relationships and, subsequently, influence the selection 
of the basis functions. The solutions { ( A ) } to the 
integral equation are ordered by the eigenvalues such 
that regions of the interval A which have large variance 
are weighted more heavily. A change in the spectral 


76 


classes such that the means are further apart, will cause 
an increase in the variance along the coordinates in which 
there was an increase in the distance between probability 
distributions . 

3.4 Spatial Representation 

The spatial representation parameter reflects the 
ability of the sensor to represent the spatial characteris- 
tics of objects in the scene. Spatial characteristics 
may include the size, orientation, and texture of objects 
as well as the distance and direction from other objects 
in the scene. In image-oriented pattern recognition 
systems the spatial representation parameter is paramount 
since the spatial characteristics are information bearing 
features; whereas, in numerically oriented systems the 
spatial representation is less important but significant. 

The fundamental quantity for spatial representation 
is the ground resolution element size. The ground resolution 
element is the area of the earth's surface which is being 
observed by the sensor at a given instant of time. A 
physically realizable sensor system is constrained to ob- 
serve an area of finite size. The area of the ground 
resolution element is determined by the sensor's instantane- 
ous field of view (IFOV), altitude, velocity, and scan rate. 

The size of the ground resolution element determines 
what information classes can be observed. If the size of 
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an object is denoted by F and the size of the ground 
resolution element is denoted by a, three relationships 
between F and A can be expressed: a > F, A ~ F, and A < F 

as shown in Figure 3.6. If the size of objects or fields 
is smaller than the resolution element size it is very 
difficult to identify them. If the object size and reso- 
lution element size are about the same, the performance 
is marginal, principally because the center of the object 
differs from the center of a ground resolution element a 
significant percentage of the time. Quite often the ob- 
ject will occupy space in small portions of two or more 
resolution elements. The resulting mixed elements may 
have spectral response functions which are not character- 
istic of either the object or the surrounding area. 

The best case is when the ground resolution element 
is much smaller than the field size. For crop inventory 
applications the field size determines the approximate reso- 
lution element size required to keep root mean square 
error of area estimates below a specified level (CITARS 
experiment; see Harnage and Landgrebe, 1975). Results of 
the CITARS experiment indicate that the number of resolu- 
tion elements per field should be greater than forty to 
avoid the effects of boundary resolution elements. 

Having a small ground resolution element also 
provides more sample functions per class. As discussed in 
a previous section more sample functions will provide 
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better estimates of the statistics and allow more features 
to be used to represent the spectrum. 

It might seem that the smaller the ground resolution 
element the better the performance; however, the signal-to- 
noise ratio deteriorates with decreasing resolution element 
size. The energy available to the sensor decreases as the 
area observed by the sensor at a given time decreases . 

The resulting decrease in signal-to-noise ratio tends to 
cause a degradation in the overall system performance. 
Mobasseri (1973) has shown that an increase in the ground 
resolution element size corresponds to a significant 
improvement in the classification accuracy. It is assumed 
that the size of the fields or objects is sufficiently 
large as to not be a factor in these results. Also the 
spectral representation parameters, sample size, signal-to- 
noise ratio, and the set of information classes were held 
fixed. 

In this discussion only per-point or per-element. 
classifiers have been considered so far. Classifiers 
which incorporate spatial information to improve the 
performance have been developed. The ECHO classifier 
developed by Kettig and Landgrebe (1975) divides the 
scene into homogeneous objects. These objects are then 
classified on a per field basis. Since the decision rule 
decides to which class a field belongs on the basis of its 
mean vector and covariance matrix rather than the single 
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vector from a single point, a potentially faster and 
better classification can be made. 

In the experiment of Landgrebe, Biehl and Simmons 
(1976) , the ECHO and per element classifiers were compared 
for different ground resolution element sizes. The results 
are shown in Figure 3.7. Note that for the smaller reso- 
lution element sizes the spatial classifier is slightly 
better than the per-element. As the ground resolution 
element size increases the objects size become closer to 
the resolution element size and the ECHO classifier becomes 
essentially a per-element classifier. Also the per- 
element classifier improves as the resolution element size 
increases . 

Another effort to utilize spatial information is to 
generate texture features (Haralick et al, 1973; and Wiersma 
and Landgrebe, 1976). The texture features are numerical 
quantities which loosely correspond to some intuitive 
properties of textures which humans can perceive. The 
spatial resolution in this case affects the textures which 
one can observe. A fine resolution has a more detailed 
texture as in the respon'"'. variations due to the size and 
shapes of leaves. A coarse resolution is more sensitive 
to large scale textures such as the quilt-like patterns 
of agricultural fields. 

The choice of the spatial representation parameter 
depends primarily on the choice of information classes. 




Classification performance 1% correct) 
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Figure 3.7 Classification performance vs. spatial reso- 
lution using ECHO and pei-point classifiers 
(Landgrebe et al., 1977). 


82 


Tradeoffs may be required to achieve improved signal-to- 
noise ratios or larger sample sizes. The use of spatial 
classifiers is still in early development and quantitative 
results on the effects of spatial resolution are still 
limited. 

3.5 Signal-to-Noise Ratio 

For a given remote sensing problem the signal is 
the part of the received spectral response function which 
i3 information bearing, and the noise is that part- which 
is non -in format ion bearing. The performance of the 
pattern recognition system is dependent on the ratio of 
the signal to the noise (S/N) . For remote sensing problems 
this parameter is difficult to quantify. 

There are essentially three types of noise intro- 
duced into the pattern recognition system - scene noise, 
atmospheric noise, and hardware noise. The scene noise 
consists of the variations in the response which have no 
informational value for the remote sensing problem being 
studied. An example would be the variations in the response 
of the soil when an analyst is trying to discriminate be- 
tween two crops growing in the soil. Hence, the choice of 
information classes will affect the signal-to-noise ratio. 

The atmospheric noise includes variations in the 
absorption and scattering of the electromagnetic energy 
in the atmosphere. The visible regions of the spectrum 


tend to suffer mostly from scattering in the atmosphere. 

The infra’ * portions are very susceptible to absorption 
particularly in certain bands known as water absorption 
bands (Korb, 1969) . 

The noise generated in the sensor system hardware comes 
from the thermal and shot noise introduced by the optics, 
the detectors, and the electronics. In addition quantiza- 
tion noise is added by the sensor (Billingsley, 1975) . 

Of interest here, is the effect of the noise on the 
overall performance of the system and in particular on the 
choice of the spectral parameters. 

Intuitively one would expect the noise to be a limiting 
factor on the classification performance. Because of the 
randomness of the spectral response at the earth's surface, 
the probability distributions will overlap even if no atmos- 
pheric or hardware noise is added. Hence, in general there 
is some inherent classification performance which cannot be 
improved upon due to scene and atmospheric noise. However, 
noise introduced in the hardward can degrade this inherent 
performance. 

Several research efforts have been directed at 
determining the effect of noise on the system per formance. 

In each case the noise was modeled as additive white 
Gaussian noise. In an experiment reported by Ready et al 
(1971) pseudo-random noise was generated on a digital 
computer and added to multispectral data taken over an 
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agricultural scene. The classification performance was 
estimated for varying amounts of added noise power. 

The results showed that the overall classification per- 
formance decreased with an increase in the noise level. 
Also, it was shown that a class which was the most diffi- 
cult to identify with low noise levels suffers the most 
degradation when noise i3 added. 

In a similar experiment, usinq data taken by the MSDS 
scanner, Landgrebe et al (1976) , also, demonstrated the 
performance degradation due to added noise. An interesting 
result in this experiment was that the degradation was 
significantly less when a spatial classifier was used. 

In another research result Mobasseri et al (1978) 
studied the relationship between the spatial representation 
by the sensor and the signal-to-noise ratio. Noise was 
added to simulated multispectral data statistics, and it 
was concluded that the added noise reduced the class 
separabilities and degraded the classification accuracy. 

The effect of additive white Gaussian noise on the 


Karhunen-Loeve expansion can be demonstrated quite easily. 

2 

The covariance matrix for white noise with variance o 

n 

in N dimensions is 

fa 2 0 ... 0 1 


K = 
n 
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A linear transformation on the noise such as the transfor- 
mation determined by the Karhunen-Loeve expansion does not 
change K^. For additive white Gaussian noise the signal 
covariance and noise covariance are additive 


After the KL expansion the transformed covariance matrix 
is the diagonal matrix given by 


Y 


1 



0 • • • 


K = 


0 


Y 2 



0 

0 


Y N +0 n 


If the signal is of dimension N' then the eigenvalues for 

2 

the terms greater than N' are equal to o^. The plot of 
the locus of the eigenvalues corresDonding to the terms in 
the expansion is shown in Fiqure 3.8. The eigenvalues 
become constant at the value for N greater than N'. 

The signal-to-noise ratio for each channel, then, is 

Vv 

The weighted Karhunen-Loeve expansion can be used 
to good advantage when it is known that certain portions 
of the spectral interval have low S/N. By weighting those 


N' N 

Figure 3.8 The locus of eigenvalues for an N’ 

dimensional signal in white Gaussian noise. 


m 
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portions with high S/N more heavily, the eigenvectors 
will tend to be more sensitive to the regions with high 
S/N. In effect basis functions which have significant 
components from regions with low weights will have smaller 
eigenvalues; hence, they will be ranked lower in the 
ordering of the basis functions. 

In general noise from any source tends to make dis- 
crimination between information classes more difficult. 

The degree of the performance degradation depends upon 
the statistical separability of the classes. Improvements 
in the signal-to-noise ratios are most helpful when the 
separability is small. 

It is important to realize that one cannot simply 
specify a high signal-to-noise ratio without considering 
the other parameters. Because of the law of conservation 
of energy, the amount of received energy in a fixed 
spectral band over a fixed surface area at a given time 
is determined. Therefore, in order to improve S/N, it 
is necessary to modify the spectral representation parameter, 
spatial representation parameter, or both. 

We have listed one parameter from each of the five 
categories which is believed to be significant. It is impor- 
tant to note that a change in any one of the parameters — 
mean-square representation error e^, the size of the ground 
resolution element A, the signal-to-noise ratio, the number 
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of sample functions per class, or the set of information 
classes - frequently causes a change in the optimal value 
of one or more of the remaining parameters. 

One can conceive of an experiment in which a data 
set is constructed which is large enough to include several 
values for each of the parameters. An algorithm could be 
devised to optimize e q over the set of parameters with 
respect to a set of constraints which may be placed on a 
sensor system. At this time, however, a data set which 
would satisfy these requirements is not available. 

As stated before the spectral parameter is of primary 
importance in this investigation. Due to the dependence 
on the other parameters the conditions on the other param- 
eters must be stated. The size of the ground resolution 
element will be a constant for each data set. The same 
instrument will be used at the same altitude for all 
observations. Also, since the same instrument and calibra- 
tion procedures are used/ the noise due to the hardware 
will be constant. The noise due to atmospheric and scene 
variations, however, may change from stratum to stratum. 

The number of sample functions per class will vary, but 
in each case the number should be sufficiently large to 
obtain reliable results. The information classes will vary 
from location to location and for different dates of 


collection. 
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CHAPTER 4. EXPERIMENTAL SYSTEM kND RESULTS 

A software system has been developed which implements 
the sensor design procedure described previously. The 
software package basically consists of an algorithm to 
compute eigenvalues and eigenvectors, an algorithm to 
transform the data, a suboptimal sensor simulator, and a 
method of estimating classification performance. A very 
necessary part of the experimental system is the field 
measurements data library consisting of spectra taken 
over typical agricultural scenes. A block diagram showing 
the essential parts of the sensor design system is dis- 
played in Figure 4.1. This system has been implemented 
on the IBM 370/148 at the Laboratory for Application of 
Remote Sensing at Purdue University. 

This chapter begins with a description of the field 
measurements data base and how it is accessed to provide 
spectral data for the sensor system design. The software 
required to compute the eigenvalues and eigenvectors for 
an ensemble, to perform linear transformations, to simulate 
suboptimum sensors, and to estimate classification per- 
formance is described. The experimental procedure which is 
used to test the software system is presented and results 



k 


Figure 4.1 Spectral parameter design system. 


VO 

o 








91 


from using the system are displayed. An important by- 
product of the sensor design procedure is an increase in the 
understanding of the scene. Knowledge of some important 
scene characteristics is extracted with the optimal design 
system and procedure. The procedure is used to develop 
a proposed sensor design which is compared against the 
optimal design for each stratum. A discussion of the 
overall pattern recognition system performance using the 
proposed sensor is given. 

4.1 Field Measurements Data Base 

The field measurements data base consists of spectral 
samples taken with very fine spectral resolution by the 
Field Spectrometer System (FSS) mounted in a helicopter. 

The spectral resolution was 0.02 micrometers for the inter- 
val from 0.4 to 2.4 micrometers. The spectra that will be 
used to test and evaluate the method developed here were 
collected over each of two sites at three different 
times of the year. 

Field data were taken over Williams County, North 
Dakota on May 8, June 29, and August 4, 1°77. The three 
principal information classes are SPRING WHEAT, FALLOW, 
which are fields plowed regularly to conserve moisture, 
and PASTURE. For the May 8, observation date the wheat 
was about 8 cm high so that the wheat field would be 
expected to have spectral characteristics very similar to 
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bare soil; hence, one would expect that it would he quite 
difficult to distinguish between the WHEAT and the FALLOW 
classes. The second date, June 29, provided data during 
the period of the growing season when the wheat is full 
grown and is typical of green vegetation. The final date, 
August 4, provided a data set containing fields with mature 
wheat. Some of the wheat fields were harves :ed by August 4; 
making it necessary to add the class HARVESTED WHFAT . 

A second location in Finney County, Kansas was chosen 
an an example of similar classes in a different location. 
Three dates, September 28, 1976, May 3, 1977, and June 
26, 1977, were chosen corresponding to the growth stages 
emerging, full canopy, and mature. Other crops in nearby 
fields, notably grain sorghum, are ripe on the fall date 
and emergent on the spring date. The information classes 
used for this data set are WINTER WHEAT, FALLOW, and OTHER 
CROPS. 

The data sets are assembled and stored on disk in a 
format that is used by all routines that require access 
to the data. Details of the data set assembly along with 
the data storage format specification are described in 
Appendix B. Also, in the appendix complete information 
on each of the six data sets is listed. 
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4.2 Spectral Parameter Evaluation System 

The key system elements in the spectral parameter 
evaluation system are the processors SPOPTM, which com- 
putes the optimal basis functions, SPTES which uses 
the basis functions to transform the data, and SPSUB which 
simulates suboptimal sensors (Figure 4.1). 

The computation of the optimal set of bas , functions 
for an ensemble is accomplished by solving the matrix 
equation 


0 r = KW * (4.1) 

to get the eigenvalues Yj# •••# y n and the eigenvectors 
$ 1 , 4 > 2» •••» . The matrix $ is the matrix of eigenvectors, 

<t> - [ , 4>2 / • • • > 4 > n ] and r is the diagonal matrix of 
eigenvalues. 


r 




o 


o 


N 


The matrix W is a diagonal matrix of weight coefficients 


w 
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K is the covariance matrix for the ensemble. Let the mean 

T 

vector for the ensemble be M = [m , m , . .., , then 

12 .« 



E { (x^-m^^) 


(Xj-m^.) } 


The unbiased estimate is 



N 


N~-l . L . 
s k=l 


(x ik - mi )(x jk -mj) 


(4.2) 


(4.3) 


where is the number of sample functions in the ensemble. 

Note that in general the stochastic process is non- 
stationary. A zero-mean process is defined to be stationary 
in the wide sense if the covariance function depends only 
on the difference | X — C | (Papoulis, 1965). That is, 

K ( A , £ ) = K(A-f.) A , £ e A 

The covariance matrix of a stationary process has elements 
which are equal along the diagonals. The methods used 
to compute the covariance matrices and to compute eigenvalues 
and eigenvectors are valid for both stationary and non- 
stationary stochastic processes. 

Let A be the matrix product of the covariance matrix 
K and the diagonal weighted matrix W. 


A 


KW 


(4.4) 
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If the weighting matrix W is equal to the identity matrix 
I, then the kernel A is real symmetric and the solution 
to 4.1 can be found using a standard numerical algorithm 
known as the Jacobi method (Wilkinson, 1965, p 266) . The 
Jacobi method uses a sequence of similarity transformations 
to reduce a real-symmetric matrix to a diagonal matrix. 

This method is very stable and provides all of the eigen- 
values and eigenvectors with good precision. 

However, if W is not the identity matrix, then, A is 
not symmetric. An algorithm which solves the eigenvalue 
problem for real general matrices was published by Grad 
and Brebner (1968) . This algorithm, EIGENP , computes the 
eigenvalues by the QR double-step method and the eigen- 
vectors by inverse iteration. Some comments on the 
application of the algorithm to the specific computer 
used here were published by Niessner (1972) . 

The complete algorithm package consists o', the main 
subroutine EIGENP and four callable subroutines SCALE, 
HESQR, REALVE, and COMPVE . Subroutine SCALE scales the 
matrix so that the absolute sums of corresponding rows and 
columns are roughly equal. The scaled matrix is then 
normalized so that the Euclidean norm is equal to one. 

These two preliminary modifications are carried out to 
improve the accuracy of the computed results. In HESQR 
the scaled matrix is reduced to upper-Hessenberg form by 
Householder's method. The QR double-step iterative process 
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is performed on the Hessenberg matrix to reduce the matrix . 
to diagonal form within the computational accuracy limits, 
where the elements along the diagonal are the eigenvalue. 
The inverse iteration process to find corresponding eigen- 
vectors is carried out in REALVE for real eigenvalues and 
COMPVE for complex eigenvalues . Since it has been shown 
that the eigenvalues will be real for the application under 
consideration, there is no need to include COMPVE. 

Both the EIGENP algorithm and the Jacobi method have '' 
been tested on the same covariance matrix using the 
identity matrix as the weight. The differences using the 
two methods were negligible even for matrices of order 100. 

A necessary part of the Karhunen-Loeve expansion is 
the ordering of the eigenvalues and corresponding eigen- (t 


vectors. Since the eigenvalues are not ordered in the 

: ' • ' 1 

eigenvalue algorithm a sorting routine was added to the 

system to perform this task. fj 

The set of ordered eigenvectors (<f>^(X)} will be used); 

// . 
// 

to perform a linear transformation on the original data 

/ 

vectors X. To perform the linear transformation the 

// 

■ u 

coefficients corresponding to each eigenvector arev computed 


Instead of the vector X, the waveforms are represented 

by the set of coefficients {x. } where P 

J 1 



I [x( X) - m( X) ] 
A 


* i (X) w( X ) dX 


( 4.5 



This transformation on the field data i 3 performed in the / 
program SPTES. 

The statistics for each information class are needed 
to evaluate the probability of correct classification. 

The data set, now represented by the transform coefficients 
is partitioned into classes and^the corresponding mean 
vectors and covariance matrices are computed in SPTES. 

The maximum likelihood estimates are used for the mean 


vectors and covariance ^trices. 

The routine SPSUB was developed to simulate several 
suboptimal sensors. A set of N basis functions { ^ . { X ) > 

X 

is stored in memory where each function is approximated 

by a 100 element vector. As an example, a set of four 

/ 

vectors, ty^(X), ^(X); ^ 3 (X), (X) was .implemented where 

j : 



x . 

i 


- A ^ 

1+1 


elsewhere 



(4.7 


The endpoints X^ and X^ + ^ are given under sensor number 1 
in Table 4.1. The basis functions may be normalized by 
requiring 


f ['J'. ( X) 3 w(X) dX = 1 


(4.8 



Q 




Each waveform in the ensemble is approximated by 


4 


x(A) * l x. ip. (A) 
i=i 11 


(4.9) 


where 


x i " 


x( A) ip. (A) w(A)dA 
J A 1 


(4.10) 


For the normalized basis functions the expected mean-square 
representation error over the ensemble, is given by 


E fej } 


fr-^. 4 


[x( A ) - l X. Ip . ] w( A) dA 
L J A i=l 1 1 


(4.11) 


A second sensor which has been considered for practical 


rs 




Implementation and which has band edges given under sensor 

r <\ * 


A 


number two in Table 4 . l^ihas , also, been included in the 

' i:. . 

routine SPSUB 


■O 


Table 4 . 1 Spectral band locations for two practical 
sensor designs. 


’ Sensor Number 1 
Band Wavelength 


Sensor Number 2 


Band 


Wavelength 


1 

2 

3 

4 


0.5 pm to 0.6 ym 
0.6 pm to 0.7 ym 
0.7 pm to 0.8 pm 


0.8 ym to 1 . 1 ym 




Q 

1 

2 

3 

4 

5 

6 


0.45 ym to 0.52 ym 
0.52 pm to 0.60 ym 
0.63 ym to 0 .69 ym 
0 . 76 ym to 0.90 ym 
1.55 $m to 1.75 ym 
2.08 ym to 2. 35 ym 
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The output of both SPTES and SPSUB is a set of 
statistics from which it is desired to evaluate the global 
performance criterion of probability of correct classifica- 
tion. In pattern recognition terminology the estimation of 
the class conditional statistics is the training phase or 
design of the classifier. It now remains to use these 
training sets to compute the performance, A Monte Carlo 
technique has been developed to evaluate the probability 
of correct classification integral. The details of the 
technique and an evaluation of an algorithm, SPE£>TM, 
designed to implement the technique are covered in Appendix 
A. A sufficient number of representative spectral response 
functions to represent the. stratum is necessary in order 

J 1 '. .,r 

to obtain a good estimate of the statistics. Experience 
with the performance estimator algorithm has demonstrated 
that the algorithm is reasonably efficient in terms of 
execution time and accuracy. 

4 . 3 Experimental Procedure 

In this section some comments concerning the pro- 
cedures for the operation of the spectral parameter 
design system are made. These procedures are followed 
in generating the results that are given in later sections. 

A stratum is selected by choosing a location and 
collection date for which a set of field data hair, been 


t 


acquired and stored in the field measurements library. 

Sample spectral response functions are selected from the 

field data to represent the stratum. This selection is 

ft 

accomplished by specifying the tape that a particular data 
set is stored on and the date on which it was collected. 
Details concerning this procedure are covered .-in Appendix B. 
The deck of cards , containing the numerical values of the 
spectra, is read by SPRDCT which stores the response func- 

G * 

tions and some ID information onto a disk file. All of 
the analysis algorithms using the data require the data 
to be in the format described in the appendix. 

The estimate of the covariance matrix of the ensemble 
and the solutions to the matrix equation which gives the 

eigenvalues and eigenvectors are computed by the routine 

) . . . 

SPOPTM. A weight function which is stored as a vector in 

a callable subroutine is selected in SPOPTM. A subroutine 
is used to sort the eigenvalues and corresponding eigen- 
vectors such that the eigenvalues are in descending order 
of magnitude . 

An example of the output listing for SPOPTM which 
lists the first 30 eigenvalues is shown in Figure 4.2. 
Corresponding to each eigenvalue estimate is an estimate 
of the variance of the eigenvalue, an estimate of the 
variance of the eigenvector, and the expected mean-square 

' • ■ }J ' ■ . O 

representation error for using the Karhunen-Loeve expansion. 



/{ 
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LABORATORY FOR APPLICATIONS OF REMOTE SENSING 
PURDUE UNIVERSITY = 

SAMPLE FUNCTION INFORMATION 17 JULY* 1978 

EXP* NO..... •••••.••140 

NUMBER OF CLASSES ..... 

CLASS. ......... ........ . • ..........WHEAT 

NUMBER OF SAMPLE FUNCTIONS 658 

CLASS FALLOW 

NUMBER OF SAMPLE FUNCTIONS ....211 

CLASS.. UNKNOWN 

NUMBER OF SAMPLE FUNCTIONS ....682 

= ■■ o 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 


N EIGENVALUE 
1 2671.2320 
‘ 624.0196 

38.0230 


17.3402 
16.6256 
5.3304 
2.8647 

1:1411 

1.4203 

0.8294 

0.6867 

0.6755 

0.6203 

0.5687 

0.4833 

0.3670 

0.3384 

0.2647 

0.2567 

0.2105 

0.1888 

0.1656 

0.1478 

0.1177 

0.1112 

0.0903 

0.0873 

0.0786 

0.0672 


VAR (GAM) 
18423.0508 
1005.3904 
3.7328 
0.7763 
0.7137 
0.0734 
0.0212 
0.0085 
0.0071 
0.0052 
0.0018 
0.0012 
0.0012 
0.0010 
0.0008 
0.0006 
0.0003 
0.0003 
0.0002 
0.0002 
0.0001 
0.0001 
0.0001 
0.0001 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 


VAR (PHI ) 
0.0003 
0.0004 
0.0023 
0.3666 
0 .3665 
0.0049 
0.0107 
0.0960 
0.1131 
0.0491 
0.0572 
2.4945 
2.5225 
0.2657 
0.1672 
0.0706 
0.1362 
0.1403 
0.7469 
0.7494 
0.1172 
0.1336 
0.1281 
0.1062 
0.2557 
0.2611 
0.6519 
0.6752 
0.1726 
0.3435 


MEAN-SQUARE 
716.056468 
92.036884 
54.013897 
36.673708 
20.048093 
14.717731 
11 .853076 


2< 

2. 

2 . 

1 

1 


•388302 
6.967997 
6.138592 
5.451897 
4.776416 
4.156083 
3.587406 
3.104095 
737054 
398652 
133988 
877260 
666809 
1.477964 
1.312349 
1 .164518 
1.046809 
0.935630 
0.845328 
0.756043 
0.679479 
0.612256 


Figure 4.2 Sample output from SPOPTM. 


The variances are computed using the results derived in 
Chapter 2. It should be remembered that the eigenvalues 


and eigenvectors are^estimated from random samples from a 
Gaussian stochastic process. The estimate of the first 
eigenvalue is 2671.23. This estimate is approximately 
unbiased and has a standard deviation of 135.7. Similarly 
the estimate of the norm of the difference between the true 
eigenvector and the estimate is approximately unbiased. 

The standard deviation is .02. It is interesting to note 
that the variance for the 12th and 1.3th eigenvectors is 
relatively large. Recalling that the expression for the 


variance is sensitive to eigenvalues which are close 

. V 

together, the large variances are not; surprising. The 
mean-square error is computed using the eigenvalue estimates 
ID information concerning the data set is included^Sor 
reference. The eigenvectors are punched and stored in a 
card data file. A plotting routine is used to display 
the eigenvectors. Also, the eigenvectors will be used 
later to perform linear transformations on the data. 

A crude approximation to the system measurement 
error, introduced in making the field measurements, is 
used to provide a comparison with the expected mean-square 
representation error. Measurement error was assumed to 
be 7% of the numerical response value. If x(A) is the 

true signal and s(A) is the measured signal including 

\ 

added noise , then , the measurement error is 



8 
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= ( 


= I [x(A) - s ( A)] dA 

m J A 


(4.12) 


In discrete form let x, - s. 

k k 


.07 x^ and 


e m = l (*°7 V 

m k=l k 


(4.13) 


The average over the ensemble is the estimate of the 


expected measurement error. 

The linear transformation on the original data set 
using the computed eigenvectors is performed using SPTES. 

The statistics for the first N terms or features are 
computed for each class and displayed on the printer 
(Figure 4.3). Also a card deck with the statistics stored 
on it is punched for use with the classification performance 
estimator. 

The estimate of the probability of correct classifi- 


cation is obtained by SPESTM (see Appendix A) . The statis- 
tics deck output of SPTES is designed to be identical to 
the required input for SPESTM. The output of SPESTM 
includes the conditional probability of correct classifi- 
cation for each class and the overall probability of 
correct classification (Figure 4.4). 

It is possible to evaluate the contribution of each 

feature to the separability of the classes. Feature selec- 
- / 
tion is performed using the SEPARABILITY processor in 


ru i 
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MEAN VECTOR 

22.1997 17.3667 0.9930 


-0e0932 


COVARIANCE MATRIX 
2576.0550 

-371.4817 716.0103 


-54.7175 

33.6030 


3.4695 

-6.0187 


24.5298 

-8.2161 


15.4123 


MEAN VECTOR 
-16.1261 -17.6414 


-1.4610 -0.2640 


COVARIANCE %ATRIX 
100.3152 

370.0427 144.1309 
111.1944 -5.3496 
-49.1371 33.2773 


48.6596 

-2.7137 


19.3354 


MEAN VECTOR 

-16.4757 -11.2648 -0.4962 


0.1673 


COVARIANCE MATRIX 
2112.1230 

-170.6249 160.9570 

50.6295 -32.0785 46.1451 

-14.4906 -1.3899 8.9443 18.5254 

Figure 4.3 Sample output of class conditional statistics. 


# 




<? ' ; . ■, ■■ .. ' 

PROBABILITY OF CORRECT CLASSIFICATION FOR CLASS 1 = 0,8308 

PROBABILITY OF CORRECT CLASSIFICATION FOR CLASS 2 * 0.8*50 

PROBABILITY OF CORRFCT CLASSIFICATION FOR CLASS 3 = 0.5773 

OVERALL PROBABILITY OF CORRECT RECOGNITION = 0.7509 

Figure 4.4 Sample output of classification performance estimates. 
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LARSYS (Phillips, 1973) in which the divergence is computed 
using various combinations of N' features. The feature 
sets are ordered according to the average pairwise di- 
vergence. This feature selection technique allows one to 
find the best feature set quickly without having to try 
all of the possible combinations. 

Two practical sensor designs are evaluated for 
comparison with the optimal design. The spectral bands 
used to simulate these sensors was presented in Table 4.1. 
The spectral bands are contained in SPSUB which uses them 
as a set of basis functions to represent the response func- 
tion. A linear transformation is performed on the data, 
and the statistics for each class are computed. The average 
mean-square error for the suboptimal representation is 
computed and printed. The statistics are again punched 
on cards in a format suitable for SPESTM. 

SPSUB can also be used to design a practical sensor. 
The program can be modified to include any choice of 
spectral bands desired. 

4 . 4 System Testing 

The system was exercised in an effort to determine 
its capabilities and limitations. The data sets taken over 
the two locations at different times were used in the tests. 
In particular it would be good to get some feel as to what 
would be a good choice for the weight function. Also the 



number of samples that are required will be important when 
specifying what is desired in future data sets . 

4.4.1 Reconstruction 

As a first test of the system we would like to demon- 
strate the capability of the first few terms in the Karhunen- 
Loeve expansion to reconstruct the original waveform. A 
sample spectral response function from an ensemble is 
selected and the coefficients in the expansion are com- 
puted. Using N* terms in the expansion the approximation 
to the original function is given by: 

N' 

x(A) = l x . <j> . (A) + m ( A) ' (4.14) 

i=l 1 1 

where m(A) is the mean function of the process. For this 
example a uniform weight function, w(\) = 1.0 for all AeA, 
was used. 

A sequence of graphs showing the original function 
x ( A ) as a solid line and the approximated function x(A) 
as a dashed line is shown in Figures 4.5a to 4.5h. Only 
the first term in the expansion is used in Figure 4.5a; 
the first two terms are used in Figure 4.5b, and so forth. 

It is readily observed that after a few terms the approxi- 
mation is very close to the original.. 

If the average mean-square error is computed directly 
using the equation 


WAVELENGTH! MICROMETERS I 


WfiVEU-NGTHI MICROMETERS ) 


3 TERMS 4 TERMS 

Figure 4.5 Reconstruction of a single spectral response 
function using from 1 to 3 terms in the 
expansion. 
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[ x { X ) - x ( X) J dX 


(4.15) 


and averaging over the ensemble, the value of E{g) is 
equal within numerical error to the value given by summing 
the eigenvalues 

00 

E (e) = l Y. (4.16) 

i=N'+l 

as predicted by equation 2. 

4.4.2 Choice of Weight Function 

An important part of the analysis procedure is the 
choice of the weight function w(X) to be used in the 
weighted Karhunen-Loeve expansion. Four different weight 
functions , which are displayed in Figure 4.6, were proposed 
and tested. Data taken over Williams County, North Dakota 
on May 8, 1977 was used to evaluate the different weight 
functions . Comparisons were made by evaluating the 
eigenvalues, eigenvectors and classification performances 
for each of the weight functions. 

The motivation for the development of the weighted 
Karhunen-Loeve expansion is demonstrated by using the first 
weight function which has a weight of one assigned to all 
wavelengths on the spectral interval (Figure 4.6a). 

The first four eigenvectors for this weight function are 
graphed in Figure 4.7. It is noted that the first 





WAVELENGTH (MICROMETERS) 


WAVELENGTH (MICROMETERS) 


Figure 4 . 


EIGENVECTOR 3 EIGENVECTOR 4 

7 First four optimal basis functions using 
weight function number 1 over Williams 
County, May 8, 1977 data. 
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eigenvector is dominated by the variance in the signal 
for a narrow interval near 1.9 ym. A similar peak for the 

y . y ;iv 

same interval occurs in the second eiadnvector. In the 
fourth eigenvector a similar result occurs for a region 
near 1.4 ym. These two bands near 1.4 and 1.9 ym corres- 
pond to water absorption bands which severely attenuate 
the electromagnetic energy passing through the atmosphere 
at these wavelengths. The sample spectral response func- 
tions have large variations in these bands which causes 
the eigenvalue algorithm to select one or more eigen- 
vectors which are sensitive almost entirely to the portion 
of the spectral interval corresponding to one of the 

i . . f 

■ - V, - 

water absorption bands. The source of these large varia- 
tions is traced to the calibration procedure during which 
a division by a small number occurs, resulting in the 
noisy signals in the respective bands. The ability of 
eigenvectors 2 and 4 to, aid discrimination between informa- 
tion classes is limited and real contributions to the 
performance for these eigenvectors are due to the small 
but finite sensitivity in the remainder of the spectrum. 

The three remaining weight functions were chosen to 

minimize the effects of the water absorption bands . In 

\\ 

the second weight function (Figure 4.6b), the weight is 
set equal to .001 for the intervals 1.32 to 1.50 ym and 
1.76 to 1.94 ym and equal to one elsewhere. A more radical 


choice of weight function (Figure 4,6c) is based on the 
solar spectral irradiance at sea level (Handbook of 


Geophysics, 1961) . The solar irradiance is strongest in 
the visible and decreases to very small values in the 
infrared. c The two water absorption bands are accounted 
for as well as several other lesser molecular absorption 
bands. A criticism of this choice of weight function is 
that the reflectance from vegetation, for example, 
is very low in the visible while it is quite high in the 
infrared, which is the opposite of the solar irradiance 
curve. Hence, the third weight function, based on the 
irradiance curve will tend to give too much importance 
to the visible region and too little importance to the 


infrared regions / ^especially those between 1.5 and 1.7 ym 
and those between 2.2 and 2.4 ym. The fourth weight func- 
tion was chosen to weight the low reflectance typical of 
the visible region lower. It has slightly higher weight 
values for the two water absorption bands and has a weight 
of 0.7 for the visible region. The first four eigen- > 

f 

vectors for weight functions 2 and 4 are shown in 
Figures 4.8, 4.9, and 4.10, respectively. 

The expected value of the integral over A of the 
square of the response functions can be treated as a total 
received signal energy. This expected energy is equal to 
the sum of all of the eigenvalues, which is different for 
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Figure 4.8 First four optimal basis functions using 
weight function number 2 over Williams 
County, May 8, 1977 data. 










each weight function. The expected energies are 1985, 

550, 177, and 589, respectively where the units are 

<7 

relative to the units of per cent response for the spec- 
tral response, functions. Ideally, the weight function 
would reduce the energy which is noise and retain that 
which is signal. By using low weights in the water absorp- 
tion bands, an improvement in overall signal-to-noise ratio 
has been gained. However, in the case of the third weight 

‘•Tn 

function, the reduction in energy may have been too much. 

The infrared regions are not represented significantly in 
any but perhaps the second eigenvector. 

As a final comparison between weight functions, 

n 

the classification performances are examined. In Table 4.2 
the estimate of the probability of correct classification 
as a function of the number of terms in the expansion , f 
for each of the four weight functions. For ten terms it 
appears that the second and third weight functions are the 
better choices with the second weight function demonstrating^ 
a slight advantage in the first few terms. The conclusion;—^ 


drawn at this point is that the second weight function is 
the most reasonable choice and will be one used in the 
results that follow. 7 


C: 
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Table 4.2 Comparison of the probability of correct 

classification using N terms in the weighted 
Karhunen-Loeve expansion among four choices 
of weight functions. 


Weight Function 


N 

1 

2 3 

4 

1 

. 355 

.467 . .468 

.488 

2 

.443 

.729 .675 

.730 

3 

.729 

.819 .700 

.799 

4 

.736 

.833 .806 

.817 

1 5 

.742 

.851 .853 

.822 

J 6 

s 794 

.882 .896 

.834 

7 

.807 

.894 .897 

.851 

8 

.823 

.94 3 J ' |.914 

.860 

9 

.851 

,.94 3 .956 

.949 .954 

) 

.389 

10 

.862 

.9 31 

4.4.3 

Evaluation 

of the Eigenvalue 

Algorithm 


The methods employed in the algorithm EIGENP have 
been well-studied (see Wilkinson, 1965) and are characterized 
by good numerical stability and accuracy even for covariance 
matrices which have a rank of 100. The accuracy of the 
algorithm depends largely on the particular machine on 
which the algorithm is implemented. The accuracy is pro- 
portional to the rank of the matrix, to the number of 
iterations required for the iterative procedures used, and 
to 2 where t is the number of significant digits in the 
mantissa of a binary floating point number. For the IBM 370 
machine using double-precision the value of t is 56. Typi- 
cally eigenvalues can be computed which are accurate to 


iii. 
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c^/six decimal places. The norm of the difference between a 
computed eigenvector and the true eigenvector is also on 
the order of lO -- ^. 

The accuracy of the computed eigenvalues and eigen- 
vectors deteriorates slightly with the introduction of 
the weight matrix. Weight matrices containing small 
weights tend to cause underflow conditions to occur in the 
reduction to Hessenberg form. 

Computation times for matrices of rank 100 are on the 
order of 10 minutes of CPU time. Hence, one is restricted 
somewhat in using this algorithm a large number of times. 

4.4.4 Sample Size 

The number of sample functions, used to represent 
the ensemble, influences both the estimates of the eigen- 
values and eigenvectors and the estimate of the classifica- 
tiori performance. The prediction of the general effects 
of the sample size have been described earlier; however, 
it would be desirable to demonstrate these effects in 
the context of the present problem for the purpose of 
deciding whether or not a sufficient number of samples 
were collected. 

An experiment was performed using the data taken over 
Williams County on August 4, to demonstrate the effect of 
sample size. Subsets of the ensemble were used to simulate 
small data set sizes of 55, 110, and 294 sample functions. 
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The breakdown in the number of samples from the four 
information classes is shown in Table 4.3. The eigenvalues 
and eigenvectors were computed using 55, 110, 294, and 1444 
samples, respectively,. Several sample functions, which 

if 

were used to compute the Eigenvalues, were not used to 

( 

evaluate the performance because they were from fields in 
which there was some uncertainty as to which cover type 
the functions belonged. The eigenvalues and eigenvectors 
for each case were computed using the second weight func- 
tion, and the expected mean-square error was plotted as a 
function of the number of terms in the expansion in Figure 
4.11. The effect of sample size on mean-square error is 
most detectable for the number of terms greater than ten. 

It is observed that the expected mean-square error increases 
with increasing sample size. 


Table 4.3 

Sample 

County, 

size assignments 
N.D. on August 4 

for data 
, 1977, 

from 

Class 


NUMBER OF SAMPLES 


WHEAT 

25 

60 

134 

808 

WHEAT HAR 

5 

10 

22 

34 

FALLOW 

15 

25 

76 

330 

PASTURE 

10 

15 

62 

130 


55 


110 


Total 


294 


1,302 


NUMBER OF TERMS IN EXPflNSION-N 


Figure 4.11 Influence of sample size on expected mean- 
square error for Williams County, August 4, 
1977, using weight function number 2. 
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The classification performance was evaluated for each 

A' 

additional term in the sequence and a plot of P^ as a 
function of the mean-square representation error was drawn 

Jj 

for each case (Figure 4.12). The small sample function 

A 

size has two effects on Pc vs. E (e } curve. First the 

r 

smaller mean-square error causes the curve to be further 
to the left than it should be. Second, the small sample 
size causes the performance to be higher than it should be 
for a given expected mean-square error. 

The question of whether or not the set of samples 
adequately represents a stratum is a difficult one. In 
particular the method of selecting which functions to 
include in the sample is not easy to determine. One reason 


; is that relatively few sample functions are available and 

as in the case of this research one uses all the functions 
that are available. This experiment demonstrates the 

effects if we assume that the 1444 sample functions accurate- 

! • 

ly represent the ensemble. Certain trends indicate that 
| the number of samples available is adequate. The change 

I \ - 

I in the expected mean-square error is quite small between 

I the curves 297 and 1444 samples in Figure 4.11. Also, 

1 the performance as a function of representation error in 

I Figure 4.12 is probably close to accurate for the largest 



sample size. In the following the ensemble will consist 
of all of the sample functions that are available which 
is on the order of 1000. It should be pointed out that 
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Figure 4.12 Influence of sample size on the estimate of 

classification performance for Williams County, 
August 4, 1977, using weight function number 2. 
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if there are a larger number of classes, a larger number 
of samples will be needed to obtain good classification 

performance estimates using high dimensionality. 

\ o' 

4.4.5 Results 

The analytical procedure for spectral parameter de- 
sign of sensor systems was performed using the data col- 
lected on three dates over each of two locations. Results 
from using the experimental system are presented graphically 
in Figures 4.13 through 4.30. The three collection dates 
for Williams County, North Dakota, are presented first 
followed by the three data sets from Finney County, Kansas. 
Weight function number two was used for all cases. 

For each data set the expected mean-square error 
is plotted as a function of the number of terms used in 
the Karhunen-Loeve expansion. A logarithmic scale is 
used for the mean-square error because of the large range 
of values. The units for the mean-square error are rela- 
tive to the units on the spectral response function which 
are in terms of percent reflectance; Since 

00 

= l Y ■ t (4.17) 

i=l 1 . 

the units of error are relative to the expected mean- 
square value of the response functions in the ensemble., 
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f t 


E 


[x ( X) ) w { X) dX 
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Figure 4.13 Expected mean-square error as a function of 
the number of terms in the Karhunen-Loeve 
expansion for Williams County, May 8, 1977, 
using weight function number 2. 
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Figure 4.14 (cont.) 
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Figure 4.15 Estimate of probability of correct classifi- 


cation vs expected mean-square error for 
Williams County, May 8, 1977, using weight 
function number 2. 

/ 


v’ •- • 



NUMBER OF TERMS IN EXPRNSION-N 


Figure 4.16 Expected mean-square error as a function of 
the number of terms in the Karhunen-Loeve 
expansion for Williams County, June 29 , 1977 
using weight function number 2. 
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Figure 4.18 Estimate of probability of correct classifi- 
cation vs expected mean-square error for 
Williams County, June 29, 1977, using 
weight function number 2 . 
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Figure 4.19 Expected mean-square error as a function of 
the number of terms in the Karhunen-Loeve 
expansion for Williams County, August 4, 1977 
using weight function number 2. 
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Figure 4.20 First twelve eigenvectors for Williams 

County , August 4, 1977, using weight function 
number 2. 
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Figure 4.20 (cont.) 
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Figure 4.21 Estimate of probability of correct classifica 
tion vs expected mean-square error for 
Williams County, August 4, 1977, using weight 
^ function number 2. 
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Figure 4.24 


Estimate of probability of correct classifica- 
tion vs expected mean-square error for 
Finney County, September 28, 1976, using 
weight function number 2. 
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Figure 

4.26 First twelve 
May 3, 1977 , 

eigenvectors 
using weight 

for Finney County, 
function number 2, 
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Figure 4.26 (cent.) 
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Figure 4.26 (cont.) 
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Figure 4.29 First twelve ei 
June 26, 1977, 
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Figure 4.29 (cont.) 
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Figure 4.29 (cont. 
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Figure 4.30 Estimate of probability of correct classifica 
tion vs expected mean-square error for 
Finney County, June 26, 1977, using weight 
function number 2 . 
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Tile first twelve weighted eigenvectors for each set of 

.//■.. 

data iare shown. Note that the graphs are of weighted 
eigenvectors; that is, F^(A) where 

P i (A) = <f> i ( A ) w( A) (4.13) 

is plotted as a function of wavelength. The weighted 
eigenvectors will be used to determine effective ways of 

ji 

sampling the spectrum. 

The important relationship between probability of 
correct classification and expected mean-square error is 
depicted in the graphs of P vs E { e } for each data 

C 3T 

set. Starting with the first eigenvector, the values of 

a 

P and E { e } are plotted as the number of terms in the 
c r 

Karhunen-Loeve expansion is increased up to ten terms. 

Again a logarithmic scale is used for the mean-square error. 

4.5 Scene Understanding 

Although the primary thrust of this research was to 
arrive at an analytical approach to sensor design, it has 
beneficially resulted in some important contributions 
to scene understanding. Four important characteristics 
of the scene can be studied using the analysis procedure 
that has been developed, here - the dimensionality of 
the observation space, the determination of the important 
regions of the spectrum, the relationship between spectral 
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representation and classification performance, and the 
maximum achievable classification performance. These 
characteristics are evaluated over the limited number of 
data sets available. 

4.5.1 The Dimensionality of the Observation Space 
The dimensionality of the observation space is 
determined by the minimum number of basis functions re- 
quired to reduce the expected mean-square representation 
error to a value below a specified level T. The problem 
becomes that of determining an appropriate value for T. 
Consider the expected measurement error discussed earlier. 
This measurement error is an attempt to quantify the 
capability of the field data gathering system to make 
accurate measurements. If the value of T is much less 
than the expected measurement error, then, one would expect 
that no real improvement in performance may be achieved 
by increasing the number of terms in the expansion. 

The expected measurement error for each of the six 
data sets is listed in Table 4.4. Two choices for T will 
be considered. First, let the ratio of the expected 
measurement error to T^ be ten-to-one. The number of 
terms required to reduce the expected mean-square repre- 
sentation error to less than T^ is six in all but the 
first data set where only five terms are required (see 
Table 4.4). Six terms appears to be a very reasonable 






Table 4.4 Expected measurement error and proposed values for T for each 

of the data sets. /J^\ 


( <Ht 

i\j(( 

| 


Expected 

Measurement 


Number of 


Number of Number of terms 


\ 

w 


Data Set 

Error 

T i 

Terms for T^ 

T 

2 

Terms for T 2 

R < .99 

Williams Co. 
May 8 , 1977 

178.8 

17.9 

5 

1.78 

18 

■~."j 

10 

Williams Co. 
June 29, 1977 

140.7 

14.1 

6 

1.41 

20 

4 

Williams Co. 
Aug. 4, 1977 

127.4 

12.7 

6 

1.27 

17 

5 

Finney Co. 
Sept. 28, 1976 

103.9 

10.4 

6 

1.04 

18 

5 

Finney Co. 
May 3, 1977 

156.6 

15.7 

6 

1.57 

22 

5 

Finney Co. 
June 26, 1977 

165.3 

16.5 

6 

1.65 

19 

6 


f 






15.8 
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y I 

number considering both the representation accuracy and | 

the data volume required to be transmitted to the processor. 

However, for the purposes of this work it is desirable 
to decrease T further to insure that as much information 
as possible is retained. Therefore, let the ratio of 

the expected measurement error to T^ be one hundred- to-one . jj 

The number of terms required to reduce the expected measure- j 

ment error to a value less than T 2 is approximately twenty. j 

A second criterion for determining how many terms 


in the expansion to use which has often been applied is 1 

to compute the ratio I 

N . 1 


R 




(4.19) 


where N is the number of terms in the expansion and L is 
the total number of terms available. If R is equal to 1.0 
then the expected mean-square error for the process is zero. 
In general this occurs only when N=L, therefore, one must 
be content with choosing of value of R close to 1.0. 

Suppose that we choose R = 0.99 and require that the number 
N be chosen such that the right-hand term in equation 4.19 
is greater than R. This would guarantee that the expected 
representation error would be less than 1% of the total 
signal 'energy'. The last column in Table 4.4 lists the 
number of terms required to achieve this representation 


160 


A 

accuracy. The first twenty eigenvectors were vised in the 
analysis of the data which are presented in this research. 

4.5.2 Feature Selection / ,, 

It is desirable to evaluate the optimal set of basis 
functions to determine which features are contributing the 
most toward the discrimination between classes in a given 
problem. To evaluate the features it is proposed to rank | 
them according to their ability to discriminate between 
classes. This ranking will achieve three purposes. First 
the ranking will indicate whether the order of the features 
based on expected mean-square error is relevant to the 
classification problem. Second by examining the eigenvectors 

' ' ' '.‘i 

of the most significant features, some information regard- 
ing the selection of the best set of features to use in the 
classifier is obtained. Finally, the relationship between 
the observed spectral response variations and the phenomena 
being observed on the earth's surface can be examined more 
closely, since the most significant variations which affect 
separability can now be determined. 

For each data set, the information classes have been 
specified. The features in the optimal set will be 
evaluated based on the following criteria: 

- Estimate of probability of correct 
classification for each feature . 

- Computation of a separability measure 
(divergence) for combinations of features 
and ranking according to highest average 
separability. 
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- Estimate of probability of correct classifi- 
cation for combinations of features. 

The rankings for the first ten optimal features are 
listed in Table 4.5. Note that the rankings are somewhat 
subjective because the importance of a particular feature 
may be different when used in combination with other features 
than when used alone. However, those features at the top 
of the lists aie definitely superior to those at the bottom. 
For convenience the rankings are denoted by a number in 
parenthesis indicating the rank below each of the fiijst 10 

. . f) 

eigenvectors plotted m section 4.4.5. / 

In general, the ranking in Table 4.5 bears some similar- 
ities to the ranking based on expected mean-square error. 

For example, feature 1 is ranked first in two of the six 
data sets and second in two others while never being 
ranked below fifth. The low ranking for the May 8, 

Williams County data is not surprising since the first 
eigenvector is very similar to bare soil and the responses 
from both emerging wheat and fallow fields are similar to 
that characterized by bare soil. The first eigenvector 
would not be expected to be of much value for discriminating 
between the WHEAT and FALLOW classes. Feature 2 is also 
ranked high for all of the data sets. At the other end 
of the list features 9 and 10 are consistently at or near 


the bottom. 



Table 4.5 Ranking of the first 10 optimal features 
between classes. 


Rank May 8, 1977 June 29, 1977 Aug. 4, 1977 Sept 



their ability to discriminate 



12 1 

2 1 3 

4 5 2 

3 4 4 

7 - ' 3 6 

5 10 7 

6 8 5 

10 7 10 

8 9 8 

9 6 9 
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Only the first 10 features were ranked/ however, 
since twenty features were available a check was made 
of the second ten for features which may be significant. 

The importance of these features was determined by esti- 
mating the probability of correct classification in combina- 
tions with other features as well as by themselves. For 
the Williams County data sets features 11 and 12 were 
important for the May 8 data and for the 29th data. For 
the Finney County data sets features 11 and 13 were 
important for the Sept. 28 date while features 15 and 
14 were significant for the May 3rd and June 26th dates 
respectively. 

The evaluations of the spectral interval to select 
features for the classifier and to interpret observed 
phenomena will be discussed in the next sections. 

4.5.3 Classification Performance as a Function of the 
Spectral Representation 

The relationship between the overall pattern recog- 
nition system performance and the spectral representation 
parameter is graphically displayed by plotting the proba- 
bility of correct classification, P , as a function of 

c 

expected mean-square error, E (e^}. These graphs are 
plotted again in Figure 4.31 with the three graphs for each 
location on the same coordinates. One can evaluate which 
terms contribute to the performance as well as to the 
representation . 


August 4 



June 29 


MEAN-SQUARE ERROR 


Sept. 28 


■June 26 


. MEAN-SQUARE ERROR 


Figure 4.311 Estimate q £ probability of correct classifies 
I tion vs expected mean-square error for 
I (a) Williams County and (b) Finney County, 

I using weight function number 2. (See also 
I Figures 4.15, 4.18, 4.21, 4.24, 4.27. and 
I 4.30.) 
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The graph for the May 8, 1977, Williams County data is 

a 

typical. The value of P increases steadily with decreas- 

c 

ing E {e^}. At the fourth term the graph begins to level 

off at a value of 0.83, indicating that P is may be close 

c 

A 

to a maximum. However, at the eighth term the value of P 

* c 

increases ipdgnificantly for a corresponding small decrease 

," J * fS 

in E (r } before levelinq off at about P = 0.95. The June 
r c 

data set from Williams County has a similar graph with the 
final leveling off beginning at about the fifth term. 
Comparing these two data sets, a smaller mean-square 
error is required in the May data to achieve an equiva- 
lent classification performance. Hence fewer terms or 
dimensions are required to achieve a given level of per- 
formance. 

The last data set from Williams County does not 
exhibit the early leveling off noted in the first two-sets. 
The performance improves steadily until it reaches approxi- 
mately 1.0 at the seventh term. 

The September 28, 1976 data set from Finney County 
has a steady increase in performance with decreasing mean- 

A 

square error until the leveling occurs at about = .96. 

A 

Note that the value of for the first term is the highest 
of the six graphs; hence, a lot of discriminating information 
is present in the first term. The graph associated with 
the May 3, 1977, Finney County data set is still increasing 
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at the tenth term, indicating that more terms are necessary 
to determine the maximum performance. The graph for the last 
set from Finney County is similar to the graphs for the 
first two data sets from Williams County. 

The graph of P vs E { e } can be used to determine the 
c r 

degree of representation accuracy required to achieve a 
specified level of performance. For the data for Williams 
County on June 29 a relatively high value of E { e^_ } is 
acceptable; whereas, for the May data from Finney County 
requires a more accurate representation. 

For these curves there does not appear to be - any 
trends based on location of the data sets. There does seem 
to be a trend as far as the time of the growing season at 
which the data was collected is concerned. The May dates 
in both locations tend to require more representation accu- 
racy and tend to still be increasing in performance after 
using 10 terms. 

The asymptotic properties can be used to estimate the 
value of the maximum achievable classification performance. 

To find the maximum performance let E (e^) approach zero 

A A 

and observe the value of P . In most cases P will be 

c c 

constant or increasing very slowly as Ete^} becomes 
small. The value of the constant to which P^ is approaching 
is maximum value of the probability of correct classifica- 
tion. Table 4.6 lists the maximum probability of correct 
classification for each data set. Note that for the May 3, 
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Finney County, data, P is still increasing so that the 

c 

A 

maximum value of P^ is probably higher than that listed. 


Table 4.6 Maximum probability of correct classification 
for the six data sets. 


Data Set 


Approximate maximum probability 
of correct classification 


Williams Co. 
Williams Co. 
Williams Co. 
Finney Co. , 
Finney Co. , 
Finney Co. , 


, May 8, 1977 

.95 

, June 29, 1977 

.96 

, Aug. 4, 1977 

1.00 

Sept. 28, 1976 

.96 

May 3, 1977 

.93 

June 26, 1977 

.95 


4.5.4 Characteristics of the Eigenvectors 

For the six data sets there are some general charac- 
teristics of the eigenvectors which can be readily observed. 
The contribution of the spectral response to the channel 
or feature which corresponds to the eigenvector is 
determined by the portions of the spectral interval where 
the eigenvector has a magnitude or sensitivity different 
from zero. This sensitivity is apparent from the linear 
functional which determines the coefficients 


x. - x(x) <j> . (X) w ( x ) dx 
j A 1 


(4.20) 
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Therefor^ a subinterval of A which has relatively large 

values for <f>^(.X) and w(;\) will contribute significantly to 

the informational value of the coefficient x. . 

1 

i 

The eigenvectors provide some insights into the 
correlation between adjacent regions of the spectrum. Let 
the spectrum be sampled using very fine spectral bands. Let 
the measurements using these bands be denoted by u^, i=l,2, 
..., 100. The correlation between any two of the measurements 


is given by 


E (u.u. } 
1 1 


1 ^ik^jk Y k 


(4.21) 


where <j> . is the i^h element of the k^h eigenvector. If 

lK 

the correlation between two adjacent measurements u^ and 
u^ + ^ is high, then, the two measurements are not independent 
and they could be combined into a single measurement. 

It is now possible by examining the eigenvectors to deter- 
mine how narrow the spectral measurement bands should be 
in various parts of the spectrum. Eigenvectors which have 
high frequency variations in magnitude in a particular 
subinterval of the spectrum strongly indicate that it may 
be desirable to sample that subinterval using very narrow 
spectral bands. 

Referring to the results presented in section 4.4.5, 
the first eigenvector typically has the characteristics 
of the weighted mean function of the ensemble. The second 
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eigenvector has a strong component between .72 and 1.3 Mm 
and a second important component in the 1.95 to 2.4 y m 
band. For the Finney County data set taken on September 
28 e 1976 these two eigenvectors are reversed in order. 

The third and fourth eigenvectors in all of the data sets 
exhibit noticeable similarities. In the third eigenvectors 
Williams County data and the fourth eigenvectors for Finney 
County data there exists a significant component in the 
subinterval between 1.5 and 1.7 pm. The sensitivity in the 
visible region from .55 to .70 ym is strongest in the 
fourth eigenvectors for Williams County data and the third 
eigenvectors for the data from Finney County. These similar- 
ities over the different data sets are somewhat surprising 
and also encouraging in that these similarities indicate 
a strong possibility that a sensor can be built which will 
work very well over more than just a single data set. 

As eigenvectors which are later in the sequence $f 
optimum basis functions are examined, there is an increased 
occurrence of subintervals with high frequency variations 
in magnitude. It is of interest to note that several of 
these terms were important for classification performance. 
Examples of important eigenvectors which have high 
frequency variations are the sixth and eighth eigenvectors 
from the May 8, Williams County data and the seventh eigen- 
vector from the August 4, Williams County data. 
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It was observed that the subintervals from 0.6 to 
0.7 pm and from 0.9 to 1.1 urn have considerable high 
frequency variation. The 0.6 to 0.7 pm band has often 
been suggested as very important for identifying green 
vegetation. In particular, the chlorophyll absorption 
band centered at about 0.65 pirn is present (Hoffer, 1978). 
Differences in the chlorophyll pigmentation are indicators 
of plant stress. Other pigments are also present in the 
visible part of the spectrum. Therefore, there is good 
evidence that narrow spectral bands in the region between 
0.6 and 0.7 pm may be helpful. The spectral interval 
between 0.7 and 1.1 pm also possesses high frequency 
variations; however, some of these variations can be traced 
to water absorption bands occurring at 0.76, 0.93, and 
1.12 pirn. Furthermore tests using narrow spectral bands 
in this region did not improve the classification per- 
formance significantly over using a wide spectral band. 

The significant sensitivity of important eigenvectors 
in the spectral bands from 1.5 to 1.7 pm and 1.96 to 2.4 pm 
clearly indicates that these bands should be included in the 
design. The importance of including these two bands was 
further substantiated by improved classification performance. 

4.6 Suboptimal Sensor Design 

The analytical procedure which has been developed and 
tested is particularly useful^ as a tool for the design of 


practical sensor systems. Significant contributions to the 
design have been made through improved scene understanding; 
however, the primary purpose is to be able to design a prac- 
tical sensor system by specifying a particular set of 
spectral bands ( X )}. The optimal set of basis functions 
generated by the procedure provides a standard against which 
any suboptimal practical sensors can be compared. In 
addition, the optimum basis functions provide 

information regarding the proper choice of spectral bands. 

4.6.1 Comparison with Suboptimal Systems 

An important use of the optimal design is to use it 
as a standard for comparing suboptimal systems. Two subop- 
timal sensors similar to existing or future practical 
scanner systems were simulated using the spectral bands 
listed in Table 4.1. The basis functions for these 
two sensors are given by 



where the X are the endpoints listed in Table 4.1. 
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The sensors are first compared on the basis of expected 
mean-square error. In Table 4.7 the mean-square error for 
the two suboptimal sensors is compared with the optimal 
sensor for all six data sets. The mean-square error for the 
optimal sensor is shown, using the first four, six, and ten 
eigenvectors. The units of the expected mean-square error 
are relative and are significant for comparison purposes 
only. The second weight function (Figure 4.6b) was used for 
all error compuations . The large difference in mean-square 
error between the suboptimal and the optimal sensors is 
due to the fact that sensors one and two do not attempt to 
represent the entire spectral interval from 0.4 to 2.4 
micrometers. Figure 4.32 illustrates how a large contribu- 
tion to the mean-square for the suboptimal sensors results 
from the lack of spectral channels in large portions of 
the spectrum. 

Comparison can also be made on the basis of overall 
pattern recognition system performance. For each data set 
information classes were selected. The performance 
criterion was the probability of correct classification. 

The performance of the two sensors is compared with 
the optimal sensor in Figures 4.33 through 4.38. Using ten 
eigenvectors in the representation of the ensemble, the 
best four features and the best six features as determined 
by feature selection were evaluated. The choice of four and 
six features was made because suboptimal sensors one and 


Table 4.7 Comparison of expected mean-square error (in relative units) for 
each of the six data sets using two suboptimal sensors and 
the optimal sensors consisting of the first 4, 6, and 10 
eigenvectors . 


Data Set 

Sensor 1 

Sensor 2 

First Four 

First Six 

First Ten 

Williams Co. 
May 8 , 1977 

28570 

17340 

21.30 

11.04 

5.144 

Williams Co. 
June 29, 1977 

17320 

16380 

26.31 

±1.37 

5.253 

Williams Co. 
Aug. 4, 1977 

18070 

14010 

19.76 

9.315 

3.539 

Finney Co. 
Sept. 28, 1976 

13360 

11650 

18.19 

7.133 

3.035 

Finney Co. 
May 3, 1977 

22110 

16080 

36.67 

14.72 

6.968 

Finney Co. 
June 26, 1977 

23210 

17760 

26.19 

13,98 

5.769 
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Figure 4.32 Regions of the spectral interval which are not 
represented by a suboptimal sensor and which 
contribute heavily to the mean-square error. 


CL £E 



Figure 4.33 Comparisons of probability of correct cl 
fication for several sensors for Williait 
County, May 8, 1977. 
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two have four and six spectral channels respectively. 

The classification performances/ using the first four and 
the first six eigenvectors as well as the first ten eigen- 
vectors, are also provided for comparison. Several observa- 
tions can be made from comparing the performances. In 
general considerable improvement in classification per- 
formance can be achieved over that of sensor number one. In 
several cases the estimate of the probability of correct 
classification for sensor 1 one was significantly less than 
any of the other combinations of channels presented. Sub- 
optimal sensor number two does quite well, however, even 

i: 

approaching in some cases the performance of the optimal 
sensor using the first ten eigenvectors. 

For the chosen information classes, a very accurate 
representation of the original spectral response function 
is not required to obtain good performance. The information 
contained in the unused portion of the spectrum does not 
appear to be essential for the identification of these 
classes. However, for a set of information classes which 
are deeper in the information tree, a representation 
with smaller expected mean-square error may be necessary. 

There is evidence that measurements made by the 
optimal set of basis functions are uncorrelated. A measure 
of the correlation between any two measurements is the 
correlation coefficient given by 
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P ij = 


E{ (x.-m. ) (x.-m. ) > 
x x - 1 J 

E { ( ) 2 >E { ( x j -iru ) 2 } 


(4.23) 


Two measurements x. and x., j are said to be uncorre- 

l j 

lated if p. . =0. The matrix of coefficients is called 
X D 

the correlation matrix. From the properties of the Karhunen- 
Loeve expansion the off-diagonal correlation coefficients 
in the correlation matrix for the stochastic process corres- 
ponding to a stratum are zero. Therefore, the measurements 
on the process are uncorrelated. However, the class condi- 
tional correlation matrices in general do not exhibit un- 
correlated measurements (Bharucha and Kadota, 1969). In 
practice it was found that the the class conditional 
statistics are still relatively uncorrelated. As an 
example, the correlation matrices for the three classes 
from the data taken over Williams County, on June 29, 1977, 
the four band suboptimum sensor number 1 of Table 4.1 were 
computed. These matrices are listed in Table 4.8. The 
first two channels of the suboptim&l sensor are highly 
correlated and the third and fourth channels are highly 
correlated. The correlation matrices for the first four 
optimal basis functions over the same data set are 
presented in Table 4.9. There is considerably less 
correlation between any pair of channels in the optimal 
sensor for any of the three classes. The fact that the 
measurements are uncorrelated implies that the redundancy 
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of information in the measurements is minimized, and 
maximum performance can be achieved with a minimum number 
of features. 

Table 4.8 Correlation matrices for the four band suboptimal 
sensor number 1 using data taken over Williams 
County on June 29, 1977. 


Class WHEAT 


1.00 




0.99 

1.00 



0.45 

0.46 

1.00 


0.22 

0.24 

0.96 

1..00 


Class FALLOW 


1.00 




0.99 

1.00 



0.84 

0.83 

1.00 


0.68 

0.67 

0.95 

1.00 


Class PASTURE 


1.00 




0.99 

1.00 



0. 76 

0.81 

1.00 


0.68 

0.73 

0.99 

1.00 
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Table 4.9 Correlation matrices for the first four optimal 
basis functions using data taken over Williams 
County on June 29, 1977. 


Class WHEAT 


1.00 

-0.45 1.00 

-0.23 -0.35 1.00 

0.01 0.25 -0.15 1.00 


Class FALLOW 

1.00 

- 0.02 1.00 

0.24 0.30 1.00 

0.30 -0.30 0.02 1.00 


Class PASTURE 

1.00 

-0.62 1.00 

-0.79 0.30 1.00 

-0.34 0.07 0.49 1.00 

4.6.2 Evaluation of Spectral Subintervals 

In section 4.5.4 methods of evaluating the eigenvectors 
in order to determine how to select spectral channels for 
a practical sensor were discussed. Principally the eigen- 
vectors are examined to identify regions which are con- 


tributing to the information content of the scene. The 
weight function effectively eliminated two subintervals 
which were shown to be of little value. The factors which 
are important for identifying important subintervals are the 
magnitudes of the eigenvectors in these subintervals and 


it' 

r-g 
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their ranking with respect to representation error and with 
respect to classification performance. 

Examination of the eigenvectors has revealed that 
the spectral bandwidths of current sensor systems may be 
too wide in certain subintervals of the spectrum. Two 
subintervals in particular appear to have significant high 
frequency variations to merit narrow spectral-sampling 
channel widths. One of these subintervals from 0.9 to 
1.15 micrometers is known to have several minor molecular 
absorption bands which may be the cause of the increased 
high frequency variations. The subinterval from 0.6 to 
0.9 pm also possesses significant variations in the magni- 
tudes of the eigenvectors. This region is considered to 
be important for measurements on vegetation classes. There- 
fore, the proposed sensor design should reflect the 
importance of narrow sampling channels in the subinterval. 
Bandwidths as narrow as .02 ym may be required to achieve 
good performance. However, narrow spectral channels 
require more bands to cover the spectrum. The cost 
of adding more spectral channels which will cause greater 
data volume difficulties should also be considered during 
the design of the system. 

4.6.3 Proposed Sensor Design 

A proposed sensor is now designed using the techniques 
and knowledge that has been developed. It is desirable that 
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this sensor work well over all six data sets. The number of 
features or channels will be restricted to between six and 
ten. Only rectangular basis functions will be considered 
because they are orthogonal and simple to implement. The 
approach will be to pick a set of basis functions that will 
give a small expected mean-square error, and, then, compare 
the resulting classification performance with the optimum. 

The selection of spectral channels for the proposed 
sensor was based upon manual examination of the eigenvectors 
and upon use of equation 4.21 to locate adjacent uncorre- 
lated measurements of the spectrum. The eigenvectors over 
each band are studied with the intention of locating regions 
of the spectrum which need to be sampled with narrow spectral 
channels. The sampling measurements made by the field 
data collecting system are used to compute the correlation 
between measurements normalized to the respective variances. 
If two adjacent, spectral measurements are uncorrelated, a 
good choice for the location of the edge of a rectangular 
basis function might be between the two measurements. 

Graphs of the correlation coefficients as a function of 
frequency for each data set are included in Appendix C. 

It should be pointed out that even though these groups 
indicate that two points are not correlated, there still 
may not be much improvement in performance as a result of 
locating the edge of a channel between the two points. The 
fact that the edges of two channels are uncorrelated does 
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not guarantee that the spectral channels themselves are 
uncorrelated. Furthermore, the examination of the eigen- 
vectors and the computation of the correlations must be 
considered in light of the signal and noise properties 
across the spectrum. Therefore, the procedure is to design 
a proposed sensor using the principles discussed above 
and evaluate the system performance. 

The proposed sensor design was developed using the 
May 8, Williams County, data. The spectral band locations 
are listed in Table 4.10 where the basis functions are, 
again, given by 

if^U) = - 

The resulting design was tested on the remaining data sets 
and compared with the corresponding optimum sets of 
basis functions . 

The performance of this sensor design was very 
good. The expected mean-square error for each data set 
is given in Table 4.11. The expected mean-square error is 
on the order of 1000 which is a factor of 10 less than either 
suboptimal sensor one or two. This value though high 
with respect to the optimal sensor is probably about as well 
as one can do with a small number of rectangular basis func- 
tions. The classification performance is listed in Table 



4,11 and included in the bar graphs of Figures 4.33 to 
4.38 for comparison with the other systems. Performance 
is significantly better in several cases to either of 
the suboptimal systems and very close to the 10 channel 
optimal system in a number of the data sets. 


Table 4.10 Spectral band locations for the proposed 
sensor. 


Channel 

1 

2 

3 

4 

5 

6 

7 

8 


Endpoints 


.42 yin - 
.56 ym - 
.68 ym - 
.72 ym - 
.92 ym - 
1.02 ym - 
1.52 ym - 
1.96 ym - 


.54 ym 
. 66 ym 
. 70 ym 
.90 ym 
1 . 00 ym 
1 . 30 ym 
1 . 74 ym 
2 . 40 ym 
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Table 4.11 Expected mean-square error (in relative units) 
and estimated probability of correct classifi- 
cation using the proposed sensor. 


Data Set 

E {£.} 

P c 

Williams Co. 
May 8, 1977 

939 

.946 

Williams Co. 
June 29, 1977 

1700 

.969 

Williams Co. 
Aug. 4, 1977 

1016 

.995 

Finney Co. 
Sept. 28, 1976 

1068 

.953 

Finney Co 
May 3 , 1977 

1213 

.854 

Finney Co. 
June 26, 1977 

1241 

.966 
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CHAPTER 5. CONCLUSIONS AND SUGGESTIONS FOR 


FURTHER RESEARCH 


The purpose of this research was to develop an analyti- 
cal technique for selecting spectral channels as a part 
of the design of a multispectral scanner sensor system 
for remote sensing. The results and conclusions as a 
consequence of the development and implementation of this 
technique have been significant and are now summarized. 

The spectral representation parameter is one of five 
suggested interrelated parameters which influence the 
overall pattern recognition system performance criterion. 

The\ quantity associated with the spectral representation 
parameter was defined by the expected mean-square error. 

The stochastic process, consisting of an ensemble of 
spectral response functions from a stratum, was represented 
by a series expansion in a set of basis functions suitably 
weighted by coefficients. By increasing the number of bahis 
functions in the representation the expected mean-square 
representation error will decrease. The Karhunen-Loeve 
expansion was used to provide an ordered set of basis 
functions such that using the first N of them results in 
a minimum mean-square representation error over all 
possible choices of N basis functions. 


u. 
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//' 

The development of the Karhimen-Loeve expansion in 
Chapter 2 was generalized to include the possibility of 
using a weight function in order to weight the different 
portions of the spectrum relative to their importance. 

The motivation was the occurrence of strong but noisy 
spectral response variations in two regions of the spectrum 
corresponding to water absorption bands. Using the 
uniform weight function eigenvectors which were dominated 
by components in these bands were among the first five in 
the ordered sequence of optimal basis functions; however, 
their contribution to the overall performance of the system 
Was very small. By using a weight function which was 
unity except in the water absorption bands where the 
weight was very small, the eigenvectors containing 
significant components from these bands were no longer in 
the top ten or twenty eigenvectors. A very noticeable 
improvement in classification performance on a term-by-term 

basis was noted with the inclusion of this weight function. 

\ 

The analytical technique developed in this research 
has contributed to the understanding of the scene. The 
dimensionality of the observation space required to 
achieve sufficient representation accuracy to provide 
acceptible classification performance for the information 
classes was approximately six to eight. A more complex 
set of information classes may require more accurate 
representation which would necessitate using more basis 
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functions and increase the number of dimensions in the 
representation. The graph of the global performance 

i ) O' 

criterion, which is typically the probability of correct 
classification, as a function of expected mean-square 
error is useful for studying the relationship between 
the spectral representation and the overall system per- 
formance. For tb~ -information classes selected the 
graph of P versus E {e } allows one to estimate the 
maximum probability of correct classification and to 
study which eigenvectors are contributing the most to 
the classification performance. Also, the shape of this 
curve indicates whether or not the selection of the basis 
functions with respect to the mean-square error criterion 
bears any relation to the contribution to classification 
performance . The largest contribution to improved per- 
formance occurred when the first few eigenvectors in 
the sequence were used. However, in several of the strata 
used in this work it was found that eigenvectors that 
were sixth or higher in the sequence of optimum basis 
functions made important contributions to the classifi- 
cation performance. In general, there is good correlation 
between the ranking of the basis functions on the basis 
of classification performance and the ranking on the basis 
of minimizing mean-square error. 

An important aspect of understanding the scene is 
determining which portions of the spectral interval are 


i: - .OsM?" A .'J A* wi.,4 ••• -i 
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most useful. By examining which sub intervals are being 

• 6 , 

sampled from the eigenvectors which are most important 

0 v ' . , ■*- 

for classification purposes. One can identify portions 
of the spectrum which are important and subintervals 
which are strongly correlated with other subintervals. 

C\ - 

i \ " • ' , •; 

Vhe limited value of the subintervals corresponding to 
the water absorption bands near .1. . 4 and 1 . 9 micrometers 
was well-known and was verified in this research. 

It was observed that the plots of the eigenvectors^ 
which were later in the sequence tended to have increased 
high frequency variations. Coupled with the indication 
that these later terms : provide significant additional 
information for classification, it was concluded that 
some spectral regions may require a high spectral sampling 
rate. Bandwidth intervals of 0.02 ym may be required as 
compared to the 0.1 ym intervals used in the suboptimum 
sensor number one. Of particular importance was the 
indication of a need for fine sampling of the spectrum 
in the visible region corresponding to the chlorophyll 
absorption bands (0.55 - 0.70 ym) . 

The use of the weighted Karhunen-Loeve expansion was 
demonstrated to be a useful tool in the design of sensor 
systems. Two suboptimal systems which are similar to 
existing or planned operational sensor systems were 
compared with the optimal representation. For the 
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information classes used it was found that a very high 
representation accuracy was not necessary to obtain good 
performance. The practical sensors, which represented 
the spectral response functions very crudely, performed 
quite well compared to the system consisting of the set 
of optimal basis functions. However, there is a signifi- 
cant improvement in performance that can be achieved by 
a better representation in several of the cases. 

A proposed sensor design was developed using the 
design procedure. The proposed sensor consisted of eight 
rectangular bands selected on the basis of the information 
provided by the procedure. The performance of the pro- 
posed design was superior in classification performance 
to two other practical sensor designs and very much super- 
ior in representation accuracy. For the information classes 

used the classification performance of the proposed sensor 
was very close to the maximum possible in most cases. 

The conclusions drawn so far are based on a very 
limited collection of strata. To carry out the procedure 
such that the collection is representative of all possible 
strata that a given sensor may observe would require many 
more .sets of data. Suppose, for example, that it is 
desired to use a sensor to map vegetation in the United 
States. Only wheat growing areas of the central plains 
are represented by the two locations used in this work. 
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The spectral variations that are peculiar to agricultural 
scenes in the midwestern cornbelt, the small farms of 
New England, the southern cotton belt, or the fresh 
(r produce growing regions of the far west are not repre- 

| sented. Furthermore, other useful areas which may be of 
interest such as urban areas, forest lands, deserts, 

mountainous regions and large bodies of water .are not 

: . if % % 

, p. . K/ 11 ■■ 

included ip' the representation. At present the available 

data is primarily taken over the great plains; and the 
midwest. The helicopter-mounted sensor has proved to 
be an efficient method of gathering a sufficient amount 
of measurements in a short amount of time. The time- 
consuming effort that is needed is the collecting and 
correlating of ground truth information which will allow 
one to use various sets of information classes. 

An important concept which has been alluded to but 
which requires further investigation is the design of 
methods for insuring that the ensembles assembled are 
representative. Specifically it would be desirable to 
be able to make some quantitative assessment as to 
whether or not the collection of spectral response func- 
tions are representative of the ensemble associated with 
a stratum and whether or not the set of strata are 
representative of all possible strata which the sensor 
may observe. 
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The five parameters were discussed at some length in 
Chapter 3. A considerable body of research results has 
been collected relating each of these parameters to 
classification performance and in some cases showing the 
interdependence of the parameters. However , at present 
only limited attempts to vary all of the parameters 
simultaneously to arrive at some, optimal set have been 
reported. It is recommended that .sets of data be assembled 
which would allow one to vary all of the parameters. 

The available knowledge should provide guidelines for 
the proper design of such a collection. Recommended 
variables are the mean-square representation error, the 
ground resolution element size, added white noise power, 
number of training samples, and the information trees 
which correspond to the spectral representation, spatial 
representation, S/N, ancillary data, and information class 
parameters respectively. Also, it would be desirable to 
have available several other classifiers including a 
spatial classifier to evaluate performance. 




as ««**** 
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Appendix A. A Stratified Posterior Classification 
Performance Estimator 

A method was needed to estimate the classification 
performance for a maximum likelihood Gaussian classifier 
from a set of multiclass multivariate statistics. A Monte 
Carlo method may be used to evaluate the probability of 
correct classification integral. The method used here 
is based on the stratified posterior estimator developed 
by Whitsitt and Landgrebe (1977) (see also Moore, Whitsitt 
and Landgrebe, 1976) . 

Let X be an observation from one of M classes C., 

1 

i = 1,2,3/ . . . , M, with a priori probabilities P_^. The 
maximum likelihood decision rule can be stated as follows: 
Assign X to the class C^ if 

P(CJX) = max {P(C. 1 X) } 

* i 1 

where P(C^|X) is the conditional posterior probability 

for class C^ given the observation X. This rule partitions 

the observation space ft into subregions ft^ ; « 2 , ..., ft^, 

corresponding to the classes C^, C 2 , ..., C M , respectively. 

Define the indicator function 

( 1 , x e ft . 

1 

0 , x £ ft- 
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The probability of correct classification integral is 
given by 

r M 

= l P-j I i (x) p (x | C . ) dx 
c i=l 11 1 


It is desirable to evaluate the probability of correct 
classification for each class as well as the overall 
probability. The performance probability for the ith 
class is 


P = f I. (x) p(x|C )dx 
c i hi 1 . 1 


This integral is equivalent to the integral of the con- 
ditional density function whose support is . The 
overall performance, then, is 

M 

p = y p.p 

c . L , 1 c . 

1=1 1 


From Bayes ' rule 


P(C | x) p ( x) 
P (x | C . ) = — : 


P 


hence , 


c i 


f P(C. |x) 

I I. (X) P(x) dx 


where p(x) is the mixture density 




(A.l) 


(A. 2) 


(A. 3) 
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M 


p(x) = l P . p (x 1 C . ) 
j=l 3 3 


Therefore, 


M P 


c . 

l 


i f (x) P(c j x) p(x|c ) dx 

i=l i J Q 1 1 3 


(A. 4) 


Define 


Then, 


Q ± (x)' = I.(x) P(C i |x) 


* 

Q. (x) p (x | C . ) dx 
h 1 3 


is the conditional expected value of Q^(x) given that x 
comes from the class C ^ . The estimate of this expected 
value is 




N. 

! i (x l c j ) = ST JsVV 


j k=l 


Therefore the estimate of the probability of correctly 
classifying observations belonging to class is the unbiased 
estimate 



(A. 5) 
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(Whitsitt and Landgrebe, 1977) . The stratification refers 
to the sampling scheme used to obtain the estimate. A 
stratified sampling scheme takes advantage of knowledge 
of the classes to which the samples belong, whereas random 
sampling does not use the class assignment information. 

N. multivariate sample vectors are generated for class i 
from the given statistics. The maximum likelihood 
rule is used to determine the decision regions. Equation 
A. 5 is evaluated for the ISL sample vectors from each class 
and the total probability of correct classification 
is computed from equation A. 3. 

From equation A. 4 the term that must be evaluated is 


P(C i [ x) 


P. p ( x C.) 

l l 

I p k P< x ! c k > 


(A. 6) 


To evaluate this probability compute P^p(x|C^) for each 
class. Choose the largest value of P p(xjC ) which 
by the maximum likelihood decision rule will be P p(x|c^). 
The posterior probability is given by equation A. 6. 

The analysis so far can be applied to any probability 
measure. The remaining discussion will deal with the 
parametric case where the probability measure is Gaussian. 
That is 

) = “jr exp { — %(x-m, ) T K, 1 (x-m, ) } (A. 7) 

( 2 it) 2 j K k | 2 k k k 


p(x|c k 









210 


where m^ and are the L-dimensional mean vector and 
covariance matrix respectively for class k, 

O 

To reduce the number of computations required the 
linear transformation 


X = ^r-j/y + m i (A. 8) 

is introduced where <j>^ is the matrix of eigenvectors 
required to diagonalize the covariance matrix of class i, 
r\ is the diagonal matrix of eigenvalues and nu is the 
mean vector for class i. Slabs tituting equation A. 8 into 
A. 7, 

p(x|c k ) = (2 t )~% |K k l _Is exp ^[y’ r r i ^ * i T K k ” 1 


+ 2y T r.’ s (kij-V + <™ j -'\) T K k 1 <“:fVl 


In this form it is not necessary to perform the intermediate 

\\ 

computational step of transforming the generated random 
vectors to get the desired statistics. It is only necessary 
to generate M sets of random vectors y with expected 
value the zero vector and covariance matrix I and to use 
them in expression A. 8. 

The random vectors are generated using a pseudo=* 
random sequence of uniformly distributed random numbers. 


The subroutine RANDU from the IBM 360 subroutine package 
generates the random numbers and transforms them by the 
inverse cumulative-distribution-function method to 


obtain zero mean, unit variance, ^Independent Gaussian 
random numbers. These random numbers are used to fill 
the elements of the vector y. The y vectors have an 
expected value equal to the zero vector and a covariance 

matrix equal to the identity matrix. 

»\ ' 

i ' 

■y : ‘ 

Estimator Evaluation 


Since 



N. 

l Q‘ (x, ) is an unbiased estimate of 
k=l 1 K 


Q^(x) p (x Cjdx, the estimator 


M M P . . , 

_1 1 1 


N 


P = I P • I 


P, IN . 


i=i 1 j=i "i y«j k«i 


f 




(A. 


is an unbiased estimate of the probability of correct 
classification (Moore, Whitsitt, and Landgrebe, 1976) . 

The variance of the estimator can be shown to be 
smaller than the variance for a count estimator using 
stratified sampling (Moore, Whitsitt, and Landgrebe, 1976) 
The variance for the stratified count estimator is 
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If the probabilities of correct classification for each 
class are known then the variance of the stratified 
count estimator can be evaluated and used as a bound 
on the stratified posterior estimator. 

A FORTRAN program SPESTM was written which accepts 
the mean vectors and covariance matrices for up to 
ten classes and up to 10 dimensions. These statistics 
are used to generate random vectors and estimate the 
classification performance for the classes specified 
by the distributions. 

To test the method and the program a three-class 
problem was constructed. The mean vectors for the classes 
were 

- [ —1 r “If ...f “1] 

M 2 = [0, Of . .., 0] T 

m 3 = [If If ..., i] T 

The covariance for each class was the identity matrix. 

The number of random vectors generated for each class was 
1000. The exact classification accuracy as a function 
of the dimensionality can be evaluated for this case 

P = 1 - erfc ( /L/2) 
cl 

P c2 =1-2 erfc (VL72) 

P _ = 1 - erfc ( /L/2) 

P = 1 - 4/3 erfc ( /l/ 2 ) 
c 


where erfc (a) = ~~ dx 

■'a /2ir 

° 5 i ) 

and L is the dimensionality* Table AL contains the results 
of evaluating the class conditional performance and 

■ -■ I) 

the overall performance for from one to ten dimensions. 

A bound on the standard deviation of the estimator 
can be computed by calculating the standard deviation 
for the stratified count estimator. Table A2 lists 
the standard f^yiations for from one to ten dimensions 
for this experiment. 

The actual variance was estimated by repeating the 
classification performance estimation 20 times using 

•f 

different starting points in the random number generator. 

The maximum difference between the estimate and the true 

value E and the standard deviation from the true 
max 

value were computed for from one to ten dimensions as 
shown in Table A3. 

Based on the results presented in the tables, differ- 
ences in estimation of overall performance of less than ,005 \ 
(h of 1%) will not be considered significant. The per- 
formance of the algorithm is demonstrated to be quite 
adequate for its intended use. The class conditional esti- 
mates are less reliable but are sufficient to observe trends 

in the performance due to the individual classes. Th£ run- 

// ')\ 

‘ 1 

ning time for this algorithm is quite reasonable, even for 
ten dimensions. 



I II nr IT — — *1 — — w 




r*zrs.w*!">»<r3*r.w *. «r. • *•■ :<■> 


Table Al . Test of error estimator. 







A 



■A 


P 

P : 

P 

P 

P , 

p 

p 

P 


C 1 

c 2 

C 3 

c 

cl 

c2 

c3 

c 

1 

0.6915 

0.3829 

0.6915 

0.5886 

0.6859 

0.3793 

0.7001 

0.5884 

2 

0.7602 

0.5205 

0.7602 

0.6803 

0.7671 

0.5116 

0. 7700 

0.6829 

3 

0.8068 

0.6135 

0.8068 

0.7423 

0.8037 

0.6202 

0.8081 

0.7440 

4 

0.8413 

0.6827 

0.8413 

0.7885 

0.4283 

0.6852 

0.8550 

0.7895 

5 

0.8682 

0.7364 

0.8682 

0.8243 

0.8642 

0.7425 

0.8703 

0.8256 

6 

0.8897 

0.7793 

0.8897 

0.8529 

0.8767 

0.7939 

0.8787 

0.8498 

7 

0.9071 

0.8141 

0.9071 

0.8761 

0.8993 

0.8242 

0.9065 

0.8766 

8 

0,9214 

0.8427 

0.9214 

0.8951 

0.9129 

0.8472 

0.9240 

0.8947 

9 

0.9332 

0.8664 

0.9332 

0.9109 

0.9193 

0.8809 

0.9360 

0.9120 

10 

0.9431 

0.8862 

0.9431 

0.9241 

0.9209 

0.9012 

0.9481 

0.9234 



to 

M 


Table A2. Theoretical bound of standard deviation for 
different dimensions. 

L 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 




Table A3. Experimental standard deviation of estimates. 


Dimensions 


cl 


c2 


c3 


P 

c 


1 


2 


3 


4 


5 


6 




9 


10 


.016 

.033 

. .010 
.019 

.017 

.049 

.003 

.005 

.018 

.036 

.010 

.018 

.014 

.027 

.002 

.005 

.016 

.046 

.017 
.0 31 

.017 

.055 

.003 

.007 

.011 

.025 

. 016 
.029 

.015 

.029 

.003 

.005 

.015 

.031 

.014 

.033 

.012 

.026 

.002 

.004 

.014 

.026 

.014 

.023 

.010 

.022 

o 00 3 
.006 

.009 

.027 

.016 

.033 

.012 

.027 

.003 

.005 

.013 

.025 

.013 

.036 

.012 

.023 

.003 

.006 

.013 

.026 

.014 

.031 

.012 

.021 

.002 

.004 

.009 

.016 

.012 

.024 

.009 

.019 

.002 

.005 


a 


a 


a 


a 


a 


a 


c 


a 


a 


a 


E 

max 

E 

max 

E max 

E 

max 

E 

max 

E 

max 

E 

max 

E 

max 

E 

max 

E 

max 


a = standard deviation 


E = maximum difference between estimate and true value 
max over 20 trials 
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Appendix B 

Data Base Description 
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Appendix B. Data Base Description 


The data sets used in this research are described and 
sufficient information to access this data is provided. 


Data set number 1 

Location: Williams County, North Dakota 

Collection date: 8 May 1977 

CLASS SAMPLE FUNCTIONS/CLASS 


SPRING WHEAT 

664 

SUMMER FALLOW 

437 

PASTURE 

164 


Field measurements library tape number: 4896 

Comments: Wheat is just emerging (plant height ~8 cm). 


Data set number 2 


Location: Williams County, North Dakota 

Collection date: 

29 June 1977 

CLASS 

SAMPLE FUNCTIONS/CLASS 

SPRING WHEAT 

787 

SUMMER FALLOW 

291 

PASTURE 

161 


Field measurements library tape number: 4897 

Comments: Wheat is green and at full height. The 

mixture is below average. 









Data 3et number 3 

i 

Location; Williams County, North Dakota 
Collection date: 4 August 1977 

° CLASS SAMPLE FUNCTIONS/CLASS 


SPRING WHEAT 931 

/SUMMER FALLOW 330 

PASTURE 183 

Field measurements library tape number: 4898 

Comments: Wheat is mature. t In a few fields the wheat 

is harvested.; 


Data set number 4 

Location: Finney County, Kansas 

Collection date: 28 September 1976 


CLASS 


WINTER WHEAT 
SUMMER FALLOW 
GRAIN SORGHUM 


SAMPLE FUNCTIONS/CLASS 


141 

414 

277 


Field measurements library tape number: 4292 

Comments: Wheat is emergent while other crops are at 

mature stages. 


Data set number 5 


Location: Finney County, Kansas 

Collection date: 3 May 1977 


CLASS SAMPLE FUNCTIONS/CLASS 

WINTER WHEAT 658 

SUMMER FALLOW 211 

OTHER CROPS 652 

Field measurements library tape number: 4295 

Comments: Wheat is near full canopy and green, 

Other crops are emergent. 




Data „set number 6 

Location: Finney County, Kansas 

Collection date: 26 June 1977 


CLASS 


SAMPLE FUNCTIONS/CLASS 


WINTER WHEAT 
SUMMER FALLOW 
GRAIN SORGHUM 


677 
643 
15 7 


Field measurements library tape number: 4296 

Comments: Wheat is mature and ready for harvest. 


Accessing the Data 

A software package called EXOSYS (Simmons et al, 1972) 
was developed at LARS for handling field measurement data. 
Sampled spectral response functions are calibrated and 
stored on magnetic tape along with pertinent identiciation 
information. EXOSYS, also, provides access to the field 


measurement data through three processors - IDLIST, GSPEC, 
and DSEL. The IDLIST processor scans the tape and lists 


information from the identification record as required. One 
can use this information to select appropriate runs to repre 
sent the ensemble. The GSPEC processor creates a punched 
deck consisting of the 100 sampled values of the spectral 
response functions for all of the desired runs. The DSEL 
processor simulates rectangular spectral channels and uses 
data from the tape to evaluate the response in each channel 
for the ensemble. 

The GSPEC processor is used to assemble the data sets. 

It is required to specify the library tape number, the 








cover type or class, and the collection date to put together 
all of the sample functions on a particular date for a 
single class (see Figure Bl) . The sample functions are 
collected by class to facilitate the estimation of 
class dependent statistics. A deck of cards containing 
the sample functions for all of the classes is read by the 
routing SPRDCT and stored on disk in the format as shown 
in Figure B2. All programs which access the data sets 
expect the data to be in this format. Supplimentar^ 
information such as the number of samples in each class 
and the name of each class are added from the terminal 
during the execution of SPBDCT (Table Bl) . Processing of 
the ensemble is accomplished with the data stored on 
the desk file? however, the data file may be stored on 
magnetic tape between processing sessions. 


i f 
a 

\s 







Table Bl. ID information locations in data storage 
format. I 

‘ t 


WORD 

ITEM 





1-15 

Date data 

set was 

assembled 


16 

Experiment number 




17 

Number of 

classes 




18 

Number of 

sample points (=100) 

21 

Number of 

samples 

for 

class 

1 

22 

Number of 

samples 

for 

class 

2 

23 

Number of 

samples 

for 

class 

3 

24 

Number of 

samples 

for 

class 

4 

25 

Number of 

samples 

for 

class 

5 

26 

Number of 

samples 

for 

class 

6 

27 

Number of 

samples 

for 

class 

7 

30-39 

Label for 

class 1 




40-49 

■ / '• 

Label for 

class 2 




50-59 ' 

Label for 

class 3 




60-69 

Label for 

class 4, 




70-> 79 

Label for 

class 5 




80-89 

Label for 

class 6 




90-99 

Label for 

class 7 






//' 


X 
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Appendix C. Correlation Between Sample Measurements on the 

Spectrum 


Graphs of the correlation coefficients as a function 
of wavelength which were used to help locate spectral band 
edges are presented. Traditional methods of spectral analy- 
sis are not appropriate for this analysis since the calcula- 
tion of the spectral density to obtain a sampling bandwidth 
assumes that the stochastic process is stationary and that 
the sampling rate will be uniform over the entire interval 
A. It is believed that it is necessary to sample some 
parts of the spectrum more frequently than others? hence, 
the correlation measure proposed here is used. 

The measure of the correlation between two adjacent 
spectral samples u^ and u^ + ^ on the interval A is the corre- 
lation coefficient given by 


p i , i+1 


Et(u.-u.)(u i+ 1 -u. +1 n 


E( (u.-u. 

■5 J. -L 


(C.l) 


> 2 ,E{(u i + r*i + i> 2> . 


The correlation coefficient can be computed using the 

' X 

eigenvalues and eigenvectors. The matrix equation which 


was solved is given by 
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$r = kw <t> 


or 


Ki/J = <pr<p T 


(C.2) 


i 

The covariance matrix K consists of the elements k. 

where the k, . are the correlations between the i^h 
ij 

and j th spectral samples. Assuming W is the identity 
matrix, the correlation coefficient is equal to k . . 
normalized by dividing by the square root of the product of 
the respective variances. Using equation C.2, two adjacent 
spectral channels have the correlation coefficient 


^ ^ik^i+1 ,k Y k 

p i,i.+l ~ 

[ ( ^ <J) ik tJ, ik Y k ) ^i+lfk i+l,k Y k )] 2 


(C. 3) 


where <t>^ is the i*"* 1 element of the k^* 1 eigenvector. 

The second weight function from Figure 4.6 is used. 
Since this weight function is unity everywhere on A except 
over the water absorption bands, equation C.3 is valid 
everywhere except over the absorption bands and on the edges 
of the absorption bands. 

The graphs of the correlation coefficients as a func- 
tion of wavelength for each of the six data sets are pre- 
sented in Figures C.l through C.6. 







WAVELENGTH ( KICROf'ETERS i 


Figure C4. Correlation coefficients vs wavelength for 
Finney County, September 28, 1976. 
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Appendix D 

Computer Program Listings 



SPRDCT 

PURPOSE 

EXOSYS DATA IN PUNCHED FORMAT IS READ AND STORED ON TAPE 
REVISED 

3 JULY, 1978 •(;' 


COMMON ID(100) ?■ 

INTEGERS IN(S) ,DAY(3> ,TIME(3) ,NOGRPS, GROUPS (5) 

INTEGERS ISAM , SINT, DATE ( IS) , INFO< 10,7) 

REAL«4 NBCOEF (2 , 5) , DATA (2500) , X < 100) 

EQUIVALENCE (DATE ( 1 ) , 1D< 1 ) ) , < INFO< 1,1), ID<30) ) , ( INFO< 1 ,2) ,ID(40> ) , 
•< INFO< 1 ,3) , ID (50) ) , ( INFO( 1 ,4) , ID(60) ) , ( INFO (1 ,5) , ID(70) ) , (INFO( 1 ,6 
•>,ID(80)) ,(INF0(1,7) ,ID(90)) 
ilEWIND 11 
NT * 100 

C . 

C ID INFORMATION 
C 

VRITE(16, 10) 

10 FORMAT (5X, ’TYPEIN DATE’, /IX, 15 (V’)) - 
READ( 15, 15) DATE 
15 FORMAT ( 15A1> 

C o'.. 

C EXP. NO. , NUMBER OFCLASSES, AND NUMBER OF DIMENSIONS 
C 

WRITE< 16,20) 

20 FORMAT (SX, ’TYPE EXP. NO. .CLASSES, AND DIMENSIONS’,/’ /// // ///’) 

ii READ( 15,25) ID ( 16) , ID( 17) , ID( 18) 

25 F0RMAT(I3,2X, 12, 3X, 13) 

Ci 

C EXPERIMENT INFORMATION 

C v. ,, ' • 

NCLS * ID( 17) 

DO 35 1*1 ,NCLS ' : 

WRITE( 16,30) I v 

30 FORMAT (5X, ’TYPE CLASS INFO AND NO . SAMPLES FOR CLASS \I1,/1X,10( 
•’/’) ,4X, *///’) 

35 READ (15,40) ( INFO(L, I ) ,L=* 1 » 10) , ID (20+ I) 

40 FORMAT (10A1.4X, 13) 

WRITE ( 11) ID 
CALL SPLBL 
C 

C READ SAMPLE FUNCTIONS FROM EACH CLASS 
C • ‘ 

DO 500 K»1,NCLS \ 

WRITE (6 ,80) .V. 

80 FORMAT (5 (/),I0X, ’SAMPLE FUNCTIONS’) 

NF = ID(20+K) 

DO 200 JJ=1 ,NF 

100 READ(5, 1000) DAY, TIME, IN,NOGRPS.NOSAMS 

1000 F0RMAT(3A4,2X,3(I2, IX) ,/,45X,SA4,/,20X,Il ,8X,I3,/) 

BEAD (5, 1 100) (DATA( I) , 1=1 ,NOSAMS) 

1100 FORMAT (20A4) 

WRITE (6, 150) IN . 

150 FORMAT (10X.20A4) 

DO 160 I«1,NT . 

160 X(I) » DATA(IM) 

WRITE(ll) X 
200 CONTINUE 
500 CONTINUE 

END FILE 11 

STOP 

END 


noowi oooooooooooonooooooooooooooooono 


236 


SPOPTM 


PURPOSE 

TO DESIGN THE OPTIMUM SENSOR FOR A GIVEN DATA SET. 

USAGE 

CALLED FROM EXEC ROUTINE 

DESCRIPTION OF PARAMETERS 

AM - MEAN VECTOR OF DATA 
COV - COVARIANCE MATRIX OF DATA 
PHI - MATRIX OF EIGENVECTORS. 

GAM * EIGENVALUES 

N - DIMENSIONALITY OF DATA SET 

NCLS - NUMBER OF CLASSES - 

SUBROUTINE AND FUNCTION SUBPROGRAMS CALLED 
EIGENP , EISORT t SPWGT 1 

METHOD 

THE KARHUNEN-LOEVE EXPANSION WITH THE MAXIMUM LIKELIHOOD ESTIMATE 
OF THE COVARIANCE MATRIX AS THE KERNEL IS USED TO REPRESENT THE 
RANDOM PROCESS. 

REVISED V 

14 AUG, 1978 \ 


COMMON ID (100) 

REALM AM( 100) , Y< 100) ,COV(5050) ,PHIP( 100, 100) 

REAL* 8 VECI ( 100, 100) ,EVI<100) , INDIC(100) , ACOV< 100, 100) V GAM< 100) 
REAL*8 PHI ( 100, 100) ,SUM 

REALM X< 100) , W( 100) ( 

REWIND 2 
WRITE( 16,5) 

FORMAT (5X, ’OPTIMUM SENSOR DESIGN’ ) 

READ ID INFORMATION 

READ (2) ID j 

WRITE(6,8) 

8 FORMAT( 1H1 ,5(/) ) 

CALL SPLBL 
N = ID( 18) 

NCT « NMN+D/2 


c 

C COMPUTE COVARIANCE 
C 

WRITE( 16, 10) 

10 FORMAT (5X, ’COVARIANCE BEING ESTIMATED (SPOPTM) ’ ) 

NCLS - ID( 17) 

NFT - 0 
DO 20 1*1, NCLS 
20 NFT « NFT + IDC20+I) 

CON a DFLOAT (NFT ) /DFLOAT (NFT- 1 ) 

DO 30 1*1, N - 
30 AM( I) * 0.0 

DO 35 1*1, NCT 
35 COV(I) * 0.0 

DO 65 IJ*1 ,NFT 
READ (2) X 
IN = 0 
DO 50 1*1, N 

AM(I) = AM( I) + X( I) /DFLOAT (NFT) 

DO 50 J* 1,1 
IN «* IN + 1 

COV(IN) = COV(IN) + X(I)*X< J)/DFL0AT(NFT“1) 

50 CONTINUE 
65 CONTINUE 
IN * 0 
DO 60 1*1, N 
DO 60 J=1 , 1 
IN * IN + 1 

€OV(IN) - COV(IN) - CON # AM(I)*AM(J) 

60 CONTINUE 
C 

C WEIGHTING FUNCTION 

C 

IN « 0 

DO 210 1*1 ,N 

DO 210 J*1,I 

IN * IN + 1 
ACOV(I.J) * OOV< IN) 

ACOV(J , I) * COV(IN) 

210 CONTINUE 

c x\ . o 

CALL SPWGT2(W) \ J 

C / 

DO 230 1*1, N 

DO/ 250 J*1,N 

250 ACOV(I, J) = ACOV ( I , J) *W( J) 

C '! 

C COMPOTE TRACE OF COVARIANCE 
C 

SUM * 0.0 
DO 80 1 = 1, N 

80 SUM = SUM + ACOV (1,1) 

C 

C COMPUTE EIGENVALUES AND EIGENVECTORS 

C . . . 

WRITE( 16,75) 

75 FORMAT <5X, ’EIGENVALUES AND EIGENVECTORS (EIGENP) ’ ) 
NM = N 
T = 56. 

CALL EIGENP (N,NM, ACOV, T, GAM, EVI ,PHI,VECI , INDIC, V) 
CALL EISORT(N,GAM,PHI) 
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C 

C PRINT EIGENVALUES AND MEAN-SQUARE ERROR 
C 

00 ■ FLOAT (NFT)/ (FLOAT (NFT- 1 ) •FLOAKNFT-l ) ) 

Cl - FLOAT <4*NFT- 1 ) / (FLOAT (NFT- 1 ) • FLOAT (NFT- 1 ) ) 

VRITB(6,1 10) 

1 10 FORMAT <S</),5X,’N’ ,5X, ’EIGENVALUE’ ,SX, 'VAR (GAM) ’ ,5X, ’VAR(PBI) ’ ,5X, 
•’MEAN-SQUARE ERROR’) 

DO 1S0 I>1,30 

VARP - 0.0 

DO 120 J- 1.100 

IF(J .EQ. 1) GO TO ’ 5 

VART - VARP ♦ C0*GAMi I)*GAM< J)/(GAM(I) - GAM(J))**2 
115 CONTINUE 
120 COKflNUE 

VARG - C1*GAN ( I ) *GAM ( I ) 

SUM - SUM - GAM ( I ) 

WRITE (6, 145) I ,GAM( I) , VARG, VARP, SUM 
145 FORMAT (4X , I2,4X,F10.4,4X,F10.4,2X,F10.4,2X,F14.6) 

150 CONTINUE 

DO 155 J- 1 ,N 
DO 155 I-l.N 

155 PHIP(I , J) - PHI(I.J) 

DO 180 J- 1 ,20 

VRITE(7, 160) (PHIPd , J) , I-l ,N) 

1C0 FORMAT (20A4) 

180 CONTINUE 
STOP 
END 


C 

C 

C SUBROUTINE FOR SORTING EIGENVALUES INTO DECENDING ORDER. 
C 

C 

C 

SUBROUTINE EISORT(N,EVR,VECR) 

REAL* 8 STORE, EVR(N) ,STOVEC( 100) ,VECR(N,N> 

C 

DO 85 I- 1 ,N 
DO 85 J- 1 ,N 

IF(EVR(I) - EVR ( J ) ) 85 , 85 , 70 
70 STORE - EVRd) 

EVR(I) - EVR(J) 

EVR(J) » STORE 
DO 80 K-1,N 

STOVEC(K) - VECR(K, I ) 

VECR(K, 1 ) - VECR(K.J) 

VECR(K, J) ■ STOVEC(K) 

80 CONTINUE 

85 CONTINUE 

RETURN 
END 
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C SUBROUTINE SPLBL PROVIDES A LABEL TO DESCRIBE THE DATA USED IN THE 
C EXPERIMENT. INFORMATION t INCLUDED IN THE LABEL CONSISTS OF THE DATE, 
C EXPERIMENT NUMBER, NUMBER OF CLASSES, NUMBER OF SAMPLES IN EACH CLASS 
C AND INFORMATION ON EACH CLASS. 

C 

C 7 JULY, 1977 
C 

€••••••••••••••••••••••••••••••••••• 

C 

SUBROUTINE SPLBL 
COMMON ID* 100) 

INTEGERS DATE* 15) , INFO( 10,5) 

EQUIVALENCE* ID( 1 ) .DATE* 1) ),( INPO< 1 , 1 ), ID<30> ),( INFO< 1 ,2) , ID<40) ) , 
•* INFO* 1 ,3) , ID*50) ) , * INFO* 1 ,4) , ID*60> ) , ( INFO* 1 ,5) , ID(70> ) 
VRITE(6,20> 

WRITE <6, 30) 

WRITE (6, 40) DATE 
WRITE(6,S0) ID* 16) 

WRITE(6,SS) ID* 17) 

NCLS - ID* 17) 

DO 10 J-l.NCLS 

WRITE(6, 60) (INFOd.J), 1-1,10) 

WRITE (6, 80) ID(20+J) 

10 CONTINUE 

20 FORMAT* 1H1.////15X, ’LABORATORY FOR APPLICATIONS OF REMOTE SENSING* 
1) 

30 FORMAT (29X,* PURDUE UNIVERSITY’) 

40 FORMAT* 12X, ’SAMPLE FUNCTION INFORMATION’ , 1 IX, 15Al/> 

50 FORMAT* 10X, ’EXP. NO, * ,27* ’ . ’ ) . 13) 

55 FORMAT U0X,’ NUMBER OF CLASSES ',18 (’.'), 12) 

60 FORMAT* 10X, ’CLASS’ ,30* ’.’>,20A1) 

80 FORMAT* 10X, 'NUMBER OF SAMPLE FUNCTIONS’ ,9* ’.*), 13) 

RETURN 

END 


C 

C 

C WEIGHTING FUNCTION NUMBER 1 

C 

C 

SUBROUTINE SPWGTl(V) 

REAL *4 W * 100) 

C 

WRITE <6, IS) 

15 FORMAT *//SX, ’WEIGHTING FUNCTION NUMBER 1’//) 
DO 20 1-1,100 
W*I> - 1.0 
29 CONTINUE 
RETURN 
END 
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C 

c 

c 

c 

c 


c 

IS 

c 


26 

30 

40 


WEIGHTING FUNCTION NUMBER 2 



SUBROUTINE SPVGT2(V> 

REAL*4 W< 100) 

WRITE(6, 15) 

FORMAT (//SX,’ WEIGHTING FUNCTION NUMBER 2*//) 

DO 20 1-1,100 
W(D - 1.0 
CONTINUE 
DO 30 1-46,55 
VCD - 0.001 
DO 40 1-68,77 
V(I> - 0.001 
RETURN 
END 



C 

C 

C 

C 

C 


C 

15 


WEIGHTING FUNCTION NUMBER 3 


SUBROUTINE SPWGT3(W) 

REALM W< 100) 

WRITE (6, 15) 

FORMAT (//5X, ’WEIGHTING FUNCTION NUMBER 3V/) 

W(l) - .73 

W(2) - .91 

W(3) - 1.08 

W(4) - 1.18 

W(5) - 1.22 

W(6) -1.20 

W(7) » 1.20 

W(8) - 1.18 

W(9) -1.17 

W(10) - 1.17 

W(ll) » 1.17 

W( 12) - 1.18 

W< 13) ■ 1.17 

W( 14) - 1.15 

W( 15) - 1.11 

W( 16) = .83 

W( 17) -1.04 

W< 18) - .57 

W( 19) « .91 

W (20) = .86 

W (2 1 ) ■ .80 

W<22> » .86 

W(23) = .81 

W(24) - .61 

W(25) » .48 

W(26) - .26 

W(27) « .28 

W<28) » .58 

W(29) « .65 

W(30) = .63 
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W(31) 

W(32) 

W<33> 

W(34) 

W<35> 

W(36) 

W(37) 

W(38) 

V<39) 

W<40) 

V(41) 

V<42> 

W (43) 

V(44) 

W(4S) 

W<46) 

W<47) 

W<48) 


.61 

.59 

.53 

.51 

.25 

.14 

.14 

.31 

.31 

.31 

• 41 

.41 

.34 

.34 

.34 

.14 

.14 

.001 


W(49) ■ .001 
W(50) « .001 
W(51) * .001 
V(S2) a .00 
W(53) . .08 
W(54) « .08 
V<55) « ,08 
W<56) * .21 
W(57) » .21 
W(58) » .21 
W(59) * .21 
W<60> « .21 
V<61) ■ .19 
V(62) = 

W( 63 ) 


W(64) 

W(65) 

W< 66 ) 

W(67) 

W(68) 

V(69) 

W(70) 

V<71) 

W(72) 

W(73) 

W(74) 

W(75) 

W( 76) 
W(77) 
W<78> 
W(79) = 
W(80) = 
W(81) « 
V(82) « 
W(83) = 
W(84) e 
W( 85 ) « 
W(86) - 
W(87) - 
W<88) = 
W(89) * 
W (90) . 
W(9I) » 
W(92) » 
W(93> * 
W(94) - 
VOS) * 
V(96) * 
W(97) - 
W (98) « 
W<99) « 
W< 100 > . 
RETURN 
END 


19 
.15 
.15 
.15 
.11 
.11 
.03 

.03 

.03 

.001 

.001 

.001 

.001 

.001 

.001 

.901 

.02 

.02 

.02 

.04 

.04 

.07 

.07 

.07 

.09 

.09 

.10 

.10 

.10 

.10 

.10 

.07 

.07 

.07 

.04 

.04 

.01 

.01 

.02 
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C 

fC 

j C WEIGHT FUNCTION NUMBER 4 
V C 
C 

SUBROUTINE SPWGT4(W) 

REALM W ( 100) 

C 

WRITE (6, IS) 

IS FORMAT (//5X, ’WEIGHT FUNCTION NUMBER 4V/> 

DO 38 1=1,15 
30 W(I) = 0.7 

DO 40 1=16,45 
40 W(I) = 1.0 

DO 50 1=46,52 /, 

50 W(I) = 0.1 V 

DO 60 1=53,67 
60 W(I) = 1.0 

DO 70 1=68,77 
70 W(I) = 0.01 

DO 80 1=78,100 
80 W(I) = 1.0 
RETURN 
END 


C , 

C SPTES TRANSFORMS THE DATA USING THE OPTIMUM SET OF BASIS 
C VECTORS, COMPUTES THE MEAN-SQUARE ERROR, AND COMPUTES THE 
C STATISTICS FOR EACH CLASS. 

C 

C 6 FEBRUARY, 1978 
C 

c 

COMMON ID< 100) 

REALM P( 10) ,PHI < 100,20) ,X( 100) , Y< 100) ,Z( 100) 

REALM AM< 100) ,AVE<20, 10) ,COV (210, 10) 

C 

C SELECT NUMBER OF TERMS 
C 

WRITE( 16, 10) 

10 FORMAT <5X , * NUMBER OF TERMS? ’ ) 

READ ( 15, 15)NTERM 
15 FORMAT (12) 

REWIND 2 
READ (2) ID 
NCLS = ID< 17) 

N = ID( 18) 

NCT = NTERM* (NTERM «• l)/2 
NFT = 0 

DO 20 1=1, NCLS 
20 , NFT = NFT + ID(20+I) 

DO 25 1=1, NCLS 
P(I) = 1. /FLOAT (NCLS) 

25 CONTINUE 

WR ITE (7 , 28) NCLS , NTERM 
28 FORMAT < 12, 3X, 12) \\ 

WRITE (7 , 30) (P ( I ) , I = 1 , NCLS) 

30 FORMAT (10F6. 4) 



c 

C COMPUTE MEAN FUNCTION 
C 

DO 300 1=1, N 
300 AM(I) = 0.0 


DO 320 K=1 ,NFT 
READ (2) X 
DO 320 1=1, N 

AM (I') = AMU) + X(I)/FLOAT(NFT) 

320 CONTINUE 
REWIND 2 
READ (2) ID 

C ■ o 

C READ EIGENVECTORS JT" 

C 

DO 40 J=1,NTERM 

READ(S ,3S) (PHI ( I , J) , 1= 1 ,N) 

35 FORMAT (20A4) 

40 CONTINUE 
C 

C LOOP ON THE SAMPLE FUNCTIONS IN THE DATA SET 
C 


AVESO = 0.0 
DO 200 ICLS= 1 ,NCLS 
DO 50 1=1, NTERM 
50c AVE(I, ICLS) = 0.0 
DO 55 1=1, NCT 
55 COVU , ICLS) = 0.0 
C 
C 


NF = ID(20+ICLS) 

CON = FLOAT (NF) /FLOAT (NF-1) 

DO 150 ISAM=1 ,NF 
C 

C READ SAMPLE POINTS FROM FUNCTION 
C 

READ (2) X 
C 

C TRANSFORM DATA USING BASIS FUNCTIONS 
C 

DO 70 J=l, NTERM 
Y<J) = 0.0 


DO 70 1=1, N 

Y(J) = Y(J) + PHI (I, J)*.(X(I> - AMU)) 

70 CONTINUE 
C 

C COMPUTE SQUARED ERROR 
C 

DO 80 1=1, N 
82 Z(I) = 0.0 


DO 85 J=l, NTERM 
DO 85 1=1, N 

Z(I) = Z(I) ♦ PHI (I,J)*Y(J) 


85 CONTINUE 

DO 88 1=1, N 

88 Z(I) = Z(I> ♦ AMU) 

XSQ = 0.0 

ZSQ = 0.0 

XZ = 0.0 

TSQ = 0.0 

DO 90 1=1, N 

XSQ = XSQ + X(I)*XU) 

ZSQ = ZSQ + Z(I)«ZU) 

XZ = XZ + 2.0*X(I)*Z(I) 
90 CONTINUE 

ESQ = (XSQ - XZ + ZSQ) 
AVESQ = AVESQ + ESQ 


C ' ■ 

C COMPUTE STATISTICS 

c 

DO 100 1=1 ,NTERM 

AVE(I,ICLS> = AVE(I,ICLS) + Yd)/FLOAT(NF) 

100 CONTINUE 

• IN * 9 ■ "■ 

DO 110 J*1,NTERM ■ ^ 

DO 110 Iff 1, J ■ 

in * in > : -s . © 

COVdN.ICLS) = COV(IN, ICLS) + Y(I)«Y<J)/FLOAT<NF~l> 

110 CONTINUE 

150 CONTINUE y~ 

c '/• . v v- ■ 

C PRINT STATISTICS 

c . 

DO 160 J=1,NTERM 
DO 160 1=1, J 
IN = IN ♦ 1 

COVdN.ICLS) = COVdN, ICLS) - CON*AVEd , ICLS)* AVE(J, ICLS) 
160 CONTINUE 

WRITE(6, 165) ICLS 

165 F0RMAT<5</) , 10X, ’STATISTICS FOR CLASS’, 14) 

CALL MCOVP (NTERM, AVE( 1 , ICLS) , COV ( 1 , ICLS) ) 

WRITE(7, 170) (AVEd , ICLS) , 1=1 ,NTERM) 

170 FORMAT (20A4) 

WRITE (7, 175) (COVd , ICLS) , 1 = 1 ,NCT) 

175 FORMAT (20A4) 

C '■ ' 

C - 

200 CONTINUE W 

AVESQ = AVESQ/FLOAT (NFT) 

WRITE(6,210)AVESQ 

210 FORMAT C///10X, ’MEAN-SQUARE ERROR =’ ,EI0. 4) 

STOP 

■ - END ■ :: . ■, v ;- >■.<■ 





C SPDECK 
C PURPOSE 

C 1. TO PROVIDE STATS IN APPROPRIATE FORM FOR SEPARABILITY 
C PROCESSOR 

C 2. TO SELECT ’FRATURES TO BE INPUT TO ERROR ESTIMATOR. 

C 

C 11 AUG., 1978 

C ■ 9; ‘ . • ; 

c 

c 

I NT EGER* 4 IC<10) 

REAL*4 AVE(20, 10) ,COV(210, 10) ,P( 10) ,COVTP<20,20) 

c 

WRITE! 16, 10) 

10 FORMAT (5X, ’SEP = 0, SEL = 1’) 

READ( 15, 15) IS 
15 FORMAT (ID 

READ(5,20)NCLS,N 
20 FORMAT (12, 3X, 12) 

■ NCT = N*(N+l)/2 - 

READ(5,25) (PCI) , 1=1 ,NCLS> 

25 FORMAT (10F6. 4) 



\\ C \\ 

) C READ STATS 

({ C • 

V,,v DO 40 K= 1 ,NCLS 

READ (5, 30) (AVE(I ,K) , 1= 1 ,N) 
READ(5,30)(COVd,K),I = l,NCT) 

30 FORMAT C20A4) 

40 CONTINUE 

IF( IS)45,45, 100 
45 CONTINUE 

C 

C PUNCH STAT DECK 
C „ 

DO 50 K=1,NCLS 

50 WRITE<7,70) (AVE(I,K> , 1=1 ,N) 

DO 60 K=1,NCLS 

60 WRITE<7,80) (COVd ,K) , 1 = 1 ,NCT) 

70 FORMAT (’ MN’ ,17A4> " 

80 FORMAT < *CV’ , 17A4) 

90 STOP 
C 
C 

100 WRITE! 16, 1 10) 

110 FORMAT (5X, ’TYPE NUMBER OF CHANNELS DESIRED 
READ( 15, 1 15)NT 
115 FORMAT (12) 

DO 150 K= 1 ,NT 
WRITE( 16, 120) 

120 FORMAT (5X, ’SELECT CHANNELS (12) ’) 

REAEK 15, 125) IC(K) 

125 FORMAT (12) 

150 CONTINUE 

C . v . 


WRITE(7,20)NCLS,NT ^ 
WRITE(7,25) (PCI) , 1=3 ,NCLS) 
NCTP = NT*(NT+I)/2 
C 

DO 200 ICLS=1 ,NCLS 
C 



DO 160 K=1,NT 
I CP = IC(K) 

160 AVE(K.ICLS) = AVEdCP, ICLS) 

WRITE (7, 30) (AVEd.ICLS) ,1=1, NT) 


C 


IN = 0 

DO 165 1=1, N J 

DO 165 J=1,I l : 

IN = IN + 1 

COVTP (I , J ) = COVdN, ICLS) 
COVTP ( J , I ) = COVdN, ICLS) 

165 CONTINUE 
C 

L = 0 ,, 

DO 1S0 K=1,NT 
L = L + 1 
DO 180 KL=1 ,L 
ICK = IC(K) 

I CL = IC(KL) 

COVTP (K.KL) = COVTP (ICK, ICL) 
180 CONTINUE 
C 


185 

200 

C 


IN = 0 

DO 185 3=1, NT 
DO 185 J=1,I 
IN = IN * 1 

COVdN, ICLS) = COVTP (I, J) 
CONTINUE 

WRITE (7, 30) (COVd, ICLS) , 1=1 ,NCTP) 
CONTINUE 


( 12 )’) 



c ,■ - • - 

c ■ : : . 

C SPSUB SIMULATES 5 SUBOPTIMUM SENSORS AND COMPUTES MEAN-SQUARE 
C ERROR AND CLASS STATISTICS FOR A SET- OF DATA.' 

C , 

C ‘31 JANUARY, 1978 

C 

C - - - - - 

C 

COMMON ID (100) 

INTEGERS SENSOR, IR (5) 

REAL*4 PHI (100,28) ,X(100) ,Y(100) ,Z(100) ,AVE(20, 10) ,COV(210, 10) 
REAL* 4 PHIN( 100,20) ,P( 10) ,VAL(100) ,V( 100) 

C 


C LIST OF VARIABLES * 

C AVE(I,J) = MEAN VECTOR FOR CLASS J 

C COV(K,J) = COVARIANCE MATRIX FOR CLASS J 

C ESQ * SQUARED ERROR FOR ONE SAMPLE FUNCTION 

C N = NUMBER OF TERMS IN THE REPRESENTATION 

G NCLS = NUMBER OF CLASSES 

C NF = NUMBER OF SAMPLES IN THE CLASS 

C PHI (I , J) = BASIS VECTOR FOR THE JTH TERM IN THE REPRESENTATION 
C X(I) = SAMPLE FUNCTION FROM ORIGINAL DATA 

C Y(I) = REPRESENTATION VECTOR 

C Z(I) = RECONSTRUCTION VECTOR 

C 

c 

C • .. ■ . 

C SELECT SENSOR FROM TERMINAL n 

C . , . H ' 

WRITE< 16,10) 

10 FORMAT (5X, ’SELECT SENSOR’ ,/10X, ’ 1 . LANDSAT\/10X, ’2. THEMATIC MAPP 
•ER’,/10X,’3. TEST’ ,/10X, ’4. PROPOSED’ ,/10X, ’5. PROPOSED 2’) 

READ( IS, IS) SENSOR 
15 FORMAT (ID 
C 

C ZERO BASIS FUNCTIONS AND SET UP SELECTED SENSOR 
C 

DO 40 J=l,20 

DO 40 1=1,100 : 

40 PHI ( I , J) =0.0 ’ : * 

GO TO (50, 100, 150, 200, 250), SENSOR f 

C 

C SET UP BASIS FUNCTIONS FOR LANDSAT 
C 

50 CONTINUE 

N * 4 
NCT = 10 
DO 60 1=5,9 
60 PHI (1,1) =1.0 
DO 65 1=10,14 
65 PHI (I, 2) = 1.0 

DO 70 1=15,19 
70 PHI (1,3) =1.0 

DO 75 1=20,34 
75 PHI (I, 4) = 1.0 

GO TO 300 
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c 

C SET UP BASIS FUNCTIONS FOR THEMATIC MAPPER 
C 


100 

CONTINUE 




N = 

: 6 





NCT = 

21 




DO 

110 

' 1 = 

=2, 

6 

110 

PHI 

<1, 

1) 

K 

1.0 


DO 

1 IS 

1 = 

=7, 

10 

115 

PHI 

(I, 

2) 

S 

1.0 


DO 

120 

I« 

'11 

,1 s 

120 

PHI 

(I, 

3) 

. ft: 

1.0 

125 

DO 

125 

1 = 

'18 

,25 

PHI 

(I, 

4) 

e 

1.0 


DO 

130 

1 = 

=58 

,68 

130 

PHI 

(I, 

S) 

= 

1.0 


DO 

135 

1 = 

=84 

,97 

135 

PHI 

<1, 

6) 

' S 

1.0 


GO 

TO 

300 



C 

C SET UP TEST BANDS 
C 

ISO CONTINUE 

N «* 6 | 

NCT * 21 . J 

PHI ( 10, 1) = I.O 
PHI (38,2) =1.0 
PHI (52,3) = 1.0 
PHI (64,4) = 1.0 
PHI (74,5) = 1.0 
PHI (92,6) * 1.0 
GO TO 300 
C 

C SET UP PROPOSED SENSOR BASIS FUNCTIONS 
C 

200 CONTINUE 
N = 8 
NCT = 36 
DO 210 1=1,7 
210 PHI (1,1) = 1.0 
DO 215 1=8,13 
215 PHI (I, 2) = 1.0 

DO 220 I=Ki lS 
220 PHI (1,3) = 1.0 
DO 225 1=16,25 
225 PHI (I, 4) =1.0 

DO 230 1=28,30 
230 PHI (1,5) = 1.0 
DO 235 1=31,45 
235 PHI (I, 6) = 1.0 

BO 240 1=56,67 
240 PHI (I, 7) =1.0 

DO 245 1=78,100 
245 PHI (I, 8) = 1.0 

GO TO 300 
C 

C SET UP PROPOSED SENSOR NUMBER 2 
C 


250 

CONTINUE 




N = 8 




NCT = 36 

,V > 



DO 255 1= 

= 1, 

13 

255 

PHI (1,1) 


1.0 


DO 260 I = 

= 14 

,15 

260 

PHI ( 1,2) 

S 

1.0 


DO 265 1= 

= 16 

,26 

265 

PHI (I, 3) 

s 

1.0 


DO 270 1= 

=27 

,32 

270 

PHI (I ,4) 

s 

1.0 
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275 

DO 275 1=33,35 
PHI (1,5) = 1.0 

280 

DO 280 1=36,45 
PHI (I, 6) =1.0 

285 

DO 285 1=56,67 
PHI (I, 7) = 1.0 

290 

DO 290 1=78, 100 
PHI < I, 8) = 1.0 

C 

C 

300 

CONTINUE // 

C 



C NORMALIZE BASIS FUNCTIONS 
C 

DO 430 L=1 ,N 
SUM = 0.0 
DO 410 1=1,100 
SUM = SUM * ABS(PHI(I ,L) ) 
410 CONTINUE 


DO 420 1=1,100 

PHIN< I ,L) = PHI ( I ,L)/SUM / 

420 CONTINUE . 

430 CONTINUE 
C 

C POSITION TAPE AND READ ID INFORMATION 
C 

REWIND 2 
READ (2) ID 


NCLS = ID( 17) 

NFT = 0 
AVESO = 0.0 
DO 302 IL=1 ,NCLS 
302 NFT = NFT + ID(20+IL) 

NXP = ID( 16) 

WRITE (6 , 305) NXP , NCLS , N 

305 FORMAT (1H1, 5 </),20X,’ SUBOPT I MUM SENSOR SIMULATION FOR EXP. NUMBER’ 

•, IS, //20X, ’NUMBER OF CLASSES =’, 13, //20X, ’NUMBER OF DIMENSIONS =’, 
•13) 

WRITE (6, 308) SENSOR 

308 FORMAT <//20X, ’SENSOR ’,11) = 

WRITE(7,310)NCLS,N A 

310 FORMAT < 12, 3X, 12) 

DO 312 IL=1,NCLS 
NF = ID(20+IL) 

312 P (IL) = 1 ./FLOAT (NCLS) 

WRITE(7,314) (PM ,1=1, NCLS) 

314 FORMAT (10F6. 4) 

CALL SPWGT2(W) 

C 

C LOOP ON THE SAMPLE FUNCTIONS IN THE DATA SET 
C 


DO 600 ICLS=1 ,NCLS 
DO 315 1=1, N 
315 AVE(I.ICLS) = 0.0 
DO 320 1=1, NCT 
320 COV( I , ICLS) = 0.0 
C 
C 

NF = ID(20+ICLS) 

CON = FLOAT (NF) /FLOAT (NF- 1 ) 

DO 500 ISAM= 1 ,NF 
C 

C READ SAMPLE POINTS FROM FUNCTION 
C 


READ<2) X 


■ C 

C TRANSFORM DATA USING BASIS FUNCTION 
C 

DO 330 J=1 ,N 
YU) = 0.0 
DO 330 1=1,100 

Y(J) - Y(J) ♦ (PHINd , J)*X(I) ) 

330 CONTINUE 
C ■ 

C COMPUTE SQUARED ERROR 
C 

DO 33S 1*1,100 
335 Z<I) =0.0 

DO 340 J«l,N 

DO 340 1=1,100 

Zd) = Z(I) + PHI (I, J)*Y<J) 

340 CONTINUE ,, 

xsq = 0.0 -I 

ZSQ = 0.0 ■’ 

XZ * 0.0 

DO 345 1=1,100 

XSQ * XSQ ♦ Xd)*X(I)*W(I) 

ZSQ = ZSQ ♦ Zd)*Zd)*W(I) 

XZ « XZ + 2.0*X(I)*Zd)*W(I) 

345 CONTINUE 

ESQ = (XSQ - XZ ♦ ZSQ) 

AVESQ = AVESQ +ESQ / 

C 

C COMPUTE STATISTICS 

C 

DO 350 1=1, N 

AVE( I , ICLS) = AVEd , ICLS) + Y( D/FLOAT (NF) 

350 CONTINUE 
IN = 0 

DO 360 J= 1 ,N 
DO 360 1=1, J 
IN = IN ♦ 1 

COV( IN, ICLS) = COV(IN, ICLS) + Y(I)*Y( J ) /FLOAT (NF-1) 

360 CONTINUE 
500 CONTINUE 

c 

C PRINT STATISTICS FOR THE CLASS 
C 

IN\* 0 

: WlDO 510 J=1,N 
v DO 510 1=1, J 

- IN = IN ♦ 1 

COVdN, ICLS) = COVdN, ICLS) - CON*AVE (I , I CLS) *AVE ( J , ICLS) 
510 CONTINUE 

VRITE(6,S1S) ICLS 

515 FORMAT (5 (/) , I0X, ’STATISTICS FOR CLASS’ , 14) 

, CALX MCOVP (N , AVE ( l , ICLS >, COV (1 , ICLS) ) 

WRITE(7,520) (AVEd, ICLS) ,1 = 1, N) 

520 FORMAT (20A4) 

VR ITE (7 , 530) (COV (I , ICLS) , I = 1 , NCT) 

530 FORMAT (20A4) 

C 

C 

600 CONTINUE 

AVESQ = A VESQ/FLOAT ( NFT > 

WRITE (6, 6 10) AVESQ > 

610 FORMAT (/// 10X , ’ MEAN~SQUARE ERROR * » ,E14.8) 

STOP 
END 



c 

C COMPUTATION OF AVERAGE ERROR OVER A DATA SET. 

C 

COMMON ID( 100) 

REALM X< 100) ,W( 100) 

C 

REWIND 2 
READ (2) ID 
CALL SPLBL 
CALL SPWGT4(W) 

N = ID ( 18) 

NCLS = ID < 17> 

DELTA =0.07 
AVE =0.0 
NFT = 0 

DO 100 K*1,NCLS 
NF = ID(20+K) 

NFT = NFT ♦ NF 
DO 100 L= 1 ,NF 
READ(2) X 
SUM =0.0 
DO 20 1=1 ,N 

SUM = SUM + DELTA*DELTA*X (I)*X(I)*W(I) 

20 CONTINUE 

AVE = AVE + SUM 
100 CONTINUE 

AVE = AVE/FLOAT(NFT) 

WRITE (6, 120) AVE 

120 FORMAT <5</),5X,’ AVERAGE MEASUREMENT ERROR = \E20.8) 

C 

STOP 

END 


C 

C CONTROL PROGRAM FOR SPEST TO PROVIDE FOR VARIABLE DIMENSIONING. 

C 24 JAN., 1978 
C 

REALM PR( 10) ,PHI ( 10, 10, 10) ,P( 10) , AM( 10, 10) ,C0V<55, 10) 

C READ INPUT VARIABLES 

C M - CLASSES: N - DIMENSIONS 

C 

READ(5,10)M,N 
10 FORMAT! 12, 3X, 12) 

C 

C P(I) - APRIORI PROBABILITIES 

C 

NCT = N*(N+l)/2 
READ(S, 15) <P( I) , 1= 1 ,M) 

15 FORMAT! 10FR. 4) 
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\-' r 


C 

c 

c 


20 

30 

40 

50 



V 

l ' . • • . 

MEANS AND COVARIANCES FOR EACH CLASS 

l. . 

DO 30 1=1, M • 

READ<5,20) (AM( J, I) , J=1 ,N) 

READ(5,20) (COVOC, I) ,K= 1 ,NCT) 

FORMAT (20A4) 

CALL MCOVP(N,AM( 1,1), COV( 1,1) > 

CONTINUE 

CALL SPESTM(M,N,PHI,P,AM,COV,PR,PC) 

DO 40 1= 1 ,M 
WRITE (6 50)1 PH( I ) 

FORMAT (//10X| ’PROBABILITY OF CORRECT CLASSIFICATION FOR CLASS’,13, 
•’ = * ,F6.4) 7 

WRITE(6,60)PC 

^FORMAT (///10X, ’OVERALL PROBABILITY OF CORRECT RECOGNITION = \F6.4 

STOP 

END 



c ------ 

C 

C SPESTM IS AN ESTIMATOR OF THE CLASSIFICATION PERFORMANCE FROM A 
C GIVEN SET OF STATISTICS FROM M CLASSES. THE ESTIMATOR IS A 
C STRATIFIED POSTERIOR ESTIMATOR (REF. WHITSITT AND LANDGREBE) . 

C THE PROBABILITY DISTRIBUTIONS ARE ASSUMED TO BE MULTIVARIATE 
C GAUSSIAN 
C 

C 19 JANUARY, 1978 
C 

C ‘ 

SUBROUTINE SPESTM(M,N,PHI,P,AM,COV,PR,PC> 

REAL*4 QP< 10) ,P( 10) ,PR( 10) ,AM( 10, 10) ,C0V(S5, 10) ,C0VT(55> 

REAL*4 GAM( 10, 10) ,PHI (N,N,M) ,DET( 10) ,C0VIN<55, 10) 

REAL*4 Y( 10) ,TE1 ( 10) ,DEL( 10) ,COVU( 10, 10) 

REAL*8 PX(10) , BIG,DEN,SDET (10) ,BETA,Z0,Z1 ,Z2,Z3 
C 

C ----- - ' ■ — 

C 

C LIST OF VARIABLES 
C M = NUMBER OF CLASSES 

C N = NUMBER OF DIMENSIONS 

C PCI) = APRIORI PROBABILITIES OF CLASS I 
C PR( I) = CLASS CONDITIONAL PERFORMANCE 
C PC = OVERALL PERFORMANCE 
C AMU, I) = MEAN VECTOR OF CLASS I 

C COV(J,I> = COVARIANCE MATRIX OF CLASS I (STORED IN UPPER TRIANGULAR 

C FORM) 

C 

c 

c 

IX = 947913 
NCT = N*(N+l)/2 


C: 


c 

C COMPUTE EIGENVALUES AND EIGENVECTORS FOR EACH MATRIX 
C 

MV = 0 
EPS = 1.0E-6 
DO 100 IJ=!,M 
DO 55 r=i,NCT 
55 COVT(I) * COV(I,IJ) 

CALL EIGEN (COVT,PHI ( 1 , 1 , 1 J) ,N,MV) 

L = 0 

DO 60 I = 1 ,N iT. 

L = L + I 

60 GAM ( I , I J ) = COVT(L) 

C 

C COMPUTE DETERMINANT AND INVERSE OF EACH MATRIX 
C 

DO 65 1=1, NCT 
65 COVT(I) = COV(I,IJ) 

CALL, SMINV(COVT,N,DET(I J) , MV, EPS, IER) 

IF(IER) 1000,70, 1000 
70 CONTINUE 

SDET(IJ) = SQRT(DET(IJ)) 

DO 75 1=1, NCT 

75 COVIN (I , I J) = COVT(I) 

100 CONTINUE 
MV = 0 
DO 105 1=1, M 
105 QPU) = 0-0 
C 

C LOOP ON CLASS I CL 
C 

PC = 0.0 
DO 500 ICL=i,M 
AVEQ = 0.0 
C 

C LOOP ON THE NUMBER OF SAMPLES 
C 

NS = 1000 
DO 300 I J= 1 ,NS 
C 

C GENERATE Y VECTOR FROM CLASS I CL 
C 

DO 110 1=1, N 
CALL RANDU (IX, IY,XP) 

IX = IY , 

CALL NDTRI (XP,Y< I) ,XD, IER) 

110 CONTINUE 
C 

C COMPUTE CONDITIONAL PROBABILITIES: FOR EACH CLASS 
C 

DO 200 JCL= 1 ,M 
IFUCL .EQ. I CL) GO TO 180 
DO 130 1=1, N 
TEl(I) = 0.0 

DEL(I) = AM( I , ICL) - AM( I ,,TCL) 

DO 130 J= 1 ,N 

TE 1 ( I ) = TE1 (I) + SQRT(GAM< J, ICL) )*Y< J)*PHI ( I , J , ICL) 
130 CONTINUE 
JJ = 0 

DO 140 1=1, N 
DO 140 J= 1 , I 
JJ = JJ ♦ 1 

COVU(I,J) = COVIN(JJ,JCL) 

COVU(J,I) = COVIN(JJ, JCL) 

140 CONTINUE 
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Z1 = 0.0 
Z2 = 0.0 
Z3 * 0.0 
DO 150 1 = 1, N 
DO 150 J*1,N 

Zi * Z1 - 0.SnEMI)*COVU(I,J)*fEHJ) 

Z2 » Z2 - TE1 ( I)*COVU( I , J) # DEL(J) 

Z3 * Z3 - 0.5*DEL(I)»COVUU,J)®DEL(J) 

150 CONTINUE 

ZSUM * Zl ♦ Z2 ♦ Z3 

IF (ZSUM .LT. -100) GO TO 190 

BETA * P( JCL) *1.0 

PX(JCL) = BETA*DEXP (Z 1 +Z2+Z3 ) /SDET (JCL) 

IF(PX(JCL) .EQ. 0.0) WRITEC 16,919) ICL, JCL, ZSUM, SDET < JCL) ,PX( JCL) 
170 CONTINUE 
GO TO 200 
180 CONTINUE 
Z0 = 0.0 
DO 185 1=1, N 
Z0 = Z0- 0.5*Y(I)*Y(I) 

185 CONTINUE 

IF(Z0 .LT. -100) GO TO 190 

BETA = P(JCL)*1.0 

PX(JCL) * BETA*DEXP(Z0) /SDET (JCL) 

IF<PX(JCL) .EQ. 0.0) VRITE( 16,919) ICL, JCL,Z0,SDET( JCL) ,PX( JCL) 

GO TO 200 

190 PX(JCL) = 0.0 

200 CONTINUE 

919 F0RMAT(SX,2IS,3E12.4) 

C 

C 

C CHOOSE THE LARGEST 
C 

BIG » -1000 
DO 220 1=1, M 

IF<PX(I) .GT. BIG) LOC = I 
IF(PX< I) .GT. BIG) BIG = PX(I) 

220 CONTINUE 
DEN = 0.0 
DO 230 1=1, M 
DEN = DEN + PX(I> 

230 CONTINUE 

Q = BIG/DEN 
C 

C AVERAGE 
C 

OP (LOC) = QP (LOC) + P(ICL)*Q/P(LOC> 

300 CONTINUE 
500 CONTINUE 

DO 510 ICL=1,M 

PR(ICL) = QP < I CL) /FLOAT (NS) 

PC * PC ♦ P( ICL)*PR( ICL) 

510 CONTINUE 
RETURN 

1000 VRITE(6, 1 100) IER 

1100 FORMATU0X, ’•••INVERSION ERROR( 12, ’)••*’ ) 

STOP 

END 


